6,014 Matching Annotations
  1. May 2025
    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      Public review:

      In this study, Porter et al report on outcomes from a small, open-label, pilot randomized clinical trial comparing dornase-alfa to the best available care in patients hospitalized with COVID-19 pneumonia. As the number of randomized participants is small, investigators describe also a contemporary cohort of controls and the study concludes about a decrease of inflammation (reflected by CRP levels) aJer 7 days of treatment but no other statistically significant clinical benefit.

      Suggestions to the authors:

      • The RCT does not follow CONSORT statement and reporting guidelines

      We thank you for this suggestion and have now amended the order and content of the manuscript to follow the CONSORT statement as closely as possible.

      • The authors have chosen a primary outcome that cannot be at least considered as clinically relevant or interesting. AJer 3 years of the pandemic with so much research, why investigate if a drug reduces CRP levels as we already have marketed drugs that provide beneficial clinical outcomes such as dexamethasone, anakinra, tocilizumab and baricitinib.

      We thank the reviewer for bringing up this central topic. The answer to this question has both a historical and practical component. This trial was initiated in June of 2020 and was completed in June of 2021. At that time there were no known treatments for the severe immune pathology of COVID19 pneumonia. In June 2020, dexamethasone data came out and we incorporated dexamethasone into the study design. It took much longer for all other anti-inflammatories to be tested. Hence, our decision to trial an approved endonuclease was based purely on basic science work on the pathogenic role of cell-free chromatin and NETs in murine sepsis and flu models and the ability of DNase I to clear them and reduce pathology in these animal models. In addition, evidence for the presence of cell-free chromatin components in COVID-19 patient plasma had already been communicated in a pre-print. Finally, several studies had reported the anti-inflammatory effects of dornase treatment in CF patients. Hence there was a strong case for a cheap, safe, pulmonary noninvasive treatment that could be self-administered outside the clinical se]ng.

      The Identification of novel/repurposed treatments effective for COVID-19 were hampered by patient recruitment to competing studies during a pandemic. This resulted in small studies with inconclusive or contrary findings. In general, effective treatments were only picked up in very large RCTs. For example, demonstrating dexamethasone as effective in COVID-19 required recruitment of 6,425 patients into the RECOVERY study. Multiple trials with anti-IL-6 gave conflicting evidence until RECOVERY recruited 4116 adults with COVID-19 (n=2022, tocilizumab and 2094, control) similar for Baracitinib (4,148 randomised to treatment and 4,008 to standard care). Anakinra is approved for patients with elevated suPAR, based on data from one randomized clinical trial of 594 patients, of whom 405 had active treatment (PMID: 34625750). However, a systematic review analysing over 1,627 patients (anakinra 888, control 739) with COVID-19 showed no benefit (PMID: 36841793). Regarding the choice of the primary endpoint, there is a wealth of clinical evidence to support the relevance of CRP as a prognostic marker for COVID-19 pneumonia patients and it is a standard diagnostic and prognostic clinical parameter in infectious disease wards. This choice in March 2020 was supported by evidence of the prognostic value of IL-6; CRP is a surrogate of IL-6. We also provide our own data from a large study of severe COVID-19 pneumonia in figure 1, showing how well CRP correlates with survival.

      In summary, our data suggest that Dornase yields an anti-inflammatory effect that is comparable or potentially superior to cytokine-blocking monotherapies at a fraction of the cost and potentially without the additional adverse effects such as the increase for co-infections.

      We now provide additional justification on these points in the introduction on pg.4 as follows:

      “The trial was ini.ated in June 2020 and was completed in September of 2021. At the start of the trial only dexamethasone had been proven to benefit hospitalized COVID-19 pneumonia pa.ents and was thus included in both arms of the trial. To increase the chance of reaching significance under challenging constraints in pa.ent access, we opted to increase our sample size by using a combina.on of randomized individuals and available CRP data from matched contemporary controls (CC) hospitalized at UCL but not recruited to a trial. These approaches demonstrated that when combined with dexamethasone, nebulized DNase treatment was an effec.ve an.-inflammatory treatment in randomized individuals with or without the implementa.on of CC data.”

      We also added the following explanation in the discussion on pg. 16:

      “Our study design offered a solution to the early screening of compounds for inclusion in larger platform trials. The study took advantage of frequent repeated measures of quantifiable CRP in each patient, to allow a smaller sample size to determine efficacy/futility than if powered on clinical outcomes. We applied a CRP-based approach that was similar to the CATALYST and ATTRACT studies. CATALYST showed in much smaller groups (usual care, 54, namilumab, 57 and infliximab, 35) that namilumab that is an antibody that blocks the cytokine GM-CSF reduced CRP even in participants treated with dexamethasone whereas infliximab that targets TNF-α had no significant effect on CRP. This led to a suggestion that namilumab should be considered as an agent to be prioritised for further investigation in the RECOVERY trial. A direct comparison of our results with CATALYST is difficult due to the different nature of the modelling employed in the two studies. However, in general Dornase alfa exhibited comparable significance in the reduction in CRP compared to standard of care as described for namilumab at a fraction of the cost. Furthermore, endonuclease therapies may prove superior to cytokine blocking monotherapies, as they are unlikely to increase the risk for microbial co-infections that have been reported for antibody therapies that neutralize cytokines that are critical for immune defence such as IL-1β, IL-6 or GM-CSF. “

      • Please provide in Methods the timeframe for the investigation of the primary endpoint

      This information is provided in the analysis on pg. 8:

      “The primary outcome was the least square (LS) mean CRP up to 7 days or at hospital discharge whichever was sooner.”

      • Why day 35 was chosen for the read-out of the endpointt?

      We now state on pg. 8 that “Day 35 was chosen as being likely to include most early mortality due to COVID-19 being 4 weeks after completion of a week of treatment. ( i.e. d7 of treatment +28 (4 x 7 days))”

      • The authors performed an RCT but in parallel chose to compare also controls. They should explain their rationale as this is not usual. I am not very enthusiastic to see mixed results like Figures 2c and 2d.

      We initially aimed at a fully randomized trial. However, the swiJ implementation of trial prioritization strategies towards large and pre-established trial plamorms in the UK made the recruitment COVID19 patients to small studies extremely challenging. Thus, we struggled to gain access to patients. Our power calculations suggested that a mixed trial with randomized and contemporary controls was the best way forward under these restrictions in patient access that could provide sufficient power.

      That being said, we also provide the primary endpoint (CRP) results in Fig. 3B as well as the results for the length of hospitalization (Fig. S3D) for the randomized subjects only.

      • Analysis is performed in mITT; this is a major limitation. The authors should provide at least ITT results. And they should describe in the main manuscript why they chose mITT analysis.

      We apologize if this point was confusing. We performed the analysis on the ITT as defined in our SAP: “The primary analysis population will be all evaluable patients randomised to BAC + dornase alfa or BAC only who have at least one post-baseline CRP measurement, as well as matched historical comparators.”

      We understand that the reason this might be mistaken as an mITT is because the N in the ITT (39) doesn’t match the number randomised and because we had stated on pg. 8 that “ Efficacy assessments of primary and secondary outcomes in the modified inten.on-to-treat popula.on were performed.”

      However, we did randomise 41 participants, but:

      One participant in the DA arm never received treatment. The individual withdrew consent and was replaced. We also have no CRP data for this participant in the database, so they were unevaluable, and we couldn’t include them in the baseline table even if we wanted to. In addition, 1 participant in BAC only had a baseline CRP measurement available. Hence not evaluable as we only have a baseline CRP measurement for this participant.

      We have corrected the confusing statement on pg. 8 and added an additional explanation.

      “Efficacy assessments of primary and secondary outcomes in the inten.on-to-treat (ITT) popula.on were performed on all randomised par.cipants who had received at least one dose of dornase alfa if randomized to treatment. For full details see Sta.s.cal Analysis Plan. The ITT was adjusted to mi.gate the following protocol viola.ons where one par.cipant in the BAC arm and one in the DA arm withdrew before they received treatment and provided only a baseline CRP measurement available. The par.cipant in the DA arm was replaced with an addi.onal recruited pa.ent. Exploratory endpoints were only available in randomised par.cipants and not in the CC. In this case, a post hoc within group analysis was conducted to compare baseline and post-baseline measurements.”

      • It is also not usual to exclude patients from analysis because investigators just do not have serial measurements. This is lost to follow up and investigators should have pre-decided what to do with lost-to-follow-up.

      Our protocol pre-specified that the primary analysis population should have at least one postbaseline CRP measurement (pg. 13 of protocol). The patient that was excluded was one that initially joined the trial but withdrew consent after the first treatment but before the first post-treatment blood sample could be drawn. Hence, the pre-treatment CRP of this patient alone provided no useful information.

      • In Table 1 I would like to see all randomized patients (n=39), which is missing. There are also baseline characteristics that are missing, like which other treatments as BAT received by those patients except for dexamethasone.

      Table 1 includes all 39 patients plus 60 CCs.<br /> Table 2 shows additional treatments given for COVID-19 as part of BAC.

      • In the first paragraph of clinical outcomes, the authors refer to a cohort that is not previously introduced in the manuscript. This is confusing. And I do not understand why this analysis is performed in the context of this RCT although I understand its pilot nature.

      One of the main criticisms we have encountered in this study has been the choice of the primary endpoint. The best way respond to these questions was to provide data to support the prognostic relevance of CRP in COVID-19 pneumonia from a separate independent study where no other treatments such as dexamethasone, anakinra or anti-IL6 therapies were administered. We think this is very useful analysis and provides essential context for the trial and the choice of the primary endpoint, indicating that CRP has good enough resolution to predict clinical outcomes.

      • Propensity-score selected contemporary controls may introduce bias in favor of the primary study analysis, since controls are already adjusted for age, sex and comorbidities.

      The contemporary controls were selected to best match the characteristics of the randomized patients including that the first CRP measurement upon admission surpassed the trial threshold, so we do not see how this selection process introduces biases, as it was blinded with regards to the course of the CRP measurements. Given that this was a small trial, matching for baseline characteristics is necessary to minimize confounding effects.

      • The authors do not clearly present numerically survivors and non-survivors at day 34, even though this is one of the main secondary outcomes.

      We now provide the mortality numbers in the following paragraph on pg. 13.

      “Over 35 days follow up, 1 person in the BAC + dornase-alfa group died, compared to 8 in the BAC group. The hazard ra.o observed in the Cox propor.onal hazards model (95% CI) was 0.47 (0.06, 3.86), which es.mates that throughout 35 days follow-up, there was a 53% reduced chance of death at any given .mepoint in the BAC + dornase-alfa group compared to the BAC group, though the confidence intervals are wide due to a small number of events. The p-value from a log-rank test was 0.460, which does not reach sta.s.cal significance at an alpha of 0.05.”

      • It is unclear why another cohort (Berlin) was used to associate CRP with mortality. CRP association with mortality should (also) be performed within the current study.

      As we explained above, the Berlin cohort CRP data serve to substantiate the relevance of CRP as a primary endpoint in a cohort that experienced sufficient mortality as this cohort did not receive any approved anti-inflammatory therapy. Mortality in our COVASE trial was minimal, since all patients were on dexamethasone and did not reach the highest severity grade, since we opted to treat patients before they deteriorated further. The overall mortality was 8% across all arms of our study, which does not provide enough events for mortality measurements. In contrast the Berlin cohort did not receive dexamethasone and all patients had reached a WHO severity grade 7 category with mortality at 30%.

      My other concerns are:

      • This report is about an RCT and the authors should follow the CONSORT reporting guidelines. Please amend the manuscript and Figure 1b accordingly and provide a CONSORT checklist.

      We now provide a CONSORT checklist and have amended the CONSORT diagram accordingly.

      • Please provide in brief the exclusion criteria in the main manuscript

      We have now included the exclusion criteria in the manuscript on pg. 6.

      “1.1.1 Exclusion criteria

      1. Females who are pregnant, planning pregnancy or breasmeeding

      2. Concurrent and/or recent involvement in other research or use of another experimental inves.ga.onal medicinal product that is likely to interfere with the study medica.on within (specify .me period e.g. last 3 months) of study enrolment 3. Serious condi.on mee.ng one of the following:

      a. Respiratory distress with respiratory rate >=40 breaths/min

      b. oxygen satura.on<=93% on high-flow oxygen

      1. Require mechanical invasive or non-invasive ven.la.on at screening

      2. Concurrent severe respiratory disease such as asthma, COPD and/or ILD

      3. Any major disorder that in the opinion of the Inves.gator would interfere with the evalua.on of the results or cons.tute a health risk for the trial par.cipant

      4. Terminal disease and life expectancy <12 months without COVID-19

      5. Known allergies to dornase alfa and excipients

      6. Par.cipants who are unable to inhale or exhale orally throughout the en.re nebulisa.on period So briefly Patients were excluded if they were:

      7. pregnant, planning pregnancy or breasmeeding

      8. Serious condition meeting one of the following:

      a. Respiratory distress with respiratory rate >=40 breaths/min

      b. oxygen satura.on<=93% on high-flow oxygen

      1. Require ven.la.on at screening

      2. Concurrent severe respiratory disease such as asthma, COPD and/or ILD

      3. Terminal disease and life expectancy <12 months without COVID-19

      4. Known allergies to dornase alfa and excipients

      5. Participants who are unable to inhale or exhale orally throughout the en.re nebulisa.on period”

      • "The final trial visit occurred at day 35." "Analysis included mortality at day 35". I am not sure I understand why. In clinicaltrials.gov all endpoints are meant to be studies at day 7 except for mortality rate day 28. Why day 35 was chosen? Please be consistent.

      Thank you for identifying this inconsistency. We have amended the record on clinicaltrials.gov to read ‘’the time to event data was censored at 28 days post last dose (up to d35) for the randomised participants and at the date of the last electronic record for the CC.”

      • Please provide in Methods the timeframe for the investigation of the primary endpoint

      • The authors performed an RCT but in parallel chose to compare also controls. They should explain their rationale as this is not usual. I am not very enthusiastic to see mixed results like Figures 2c and 2d.

      • Analysis is performed in mITT; this is a major limitation. The authors should provide at least ITT results. And they should describe in the main manuscript why they chose mITT analysis.

      • It is also not usual to exclude patients from analysis because investigators just do not have serial measurements. This is lost to follow up and investigators should have pre-decided what to do with lost-to-follow-up.

      • Figure 1b as in CONSORT statement, please provide reasons why screened patients were not enrolled.

      • In Table 1 I would like to see all randomized patients (n=39), which is missing. There are also baseline characteristics that are missing, like which other treatment as BAT received those patients except for dexamethasone.

      • In the first paragraph of clinical outcomes, the authors refer to a cohort that is not previously introduced in the manuscript. This is confusing. And I do not understand why this analysis is performed in the context of this RCT although I understand its pilot nature.

      • In Figure 2 the authors draw results about ITT although in methods describe that they performed an mITT analysis. Please be consistent.

      Please see answers provided to these queries above.

      Reviewer #2 (Recommendations For The Authors):

      1) Suppl Figure 2B would be more informative if presented as a Table with N of patients with per day sampling

      We now provide the primary end point daily sampling table in Table 3.

      2) The numbers at risk should figure under the KM curves

      The numbers at risk for figures 1E, 2C, 2D have been added as graphs either in the main figures or in the supplement.

      3) HD in Supplementary figure 3 should be explained

      We apologize for this omission. We now provide a description for the healthy donor samples that we used in the cell-free DNA measurements in figure S3B on pg. 14:

      “Compared to the plasma of anonymized healthy donors volunteers at the Francis Crick ins.tute (HD), plasma cf-DNA levels were elevated in both BAC and DA-treated COVASE par.cipants.

      4) Presentation is inappropriate for Table S4

      We thank the reviewer for pointing this issue. We have now formaxed Table S4 to be consistent with all other tables.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript is a focused investigation of the phosphor-regulation of a C. elegans kinesin-2 motor protein, OSM-3. In C-elegans sensory ciliary, kinesin-2 motor proteins Kinesin-II complex and OSM-3 homodimer transport IFT trains anterogradely to the ciliary tip. Kinesin-II carries OSM-3 as an inactive passenger from the ciliary base to the middle segment, where kinesin-II dissociates from IFT trains and OSM-3 gets activated and transports IFT trains to the distal segment. Therefore, activation/inactivation of OSM-3 plays an essential role in its ciliary function.

      Strengths:

      In this study, using mass spectrometry, the authors have shown that the NEKL-3 kinase phosphorylates a serine/threonine patch at the hinge region between coiled coils 1 and 2 of an OSM-3 dimer, referred to as the elbow region in ubiquitous kinesin-1. Phosphomimic mutants of these sites inhibit OSM-3 motility both in vitro and in vivo, suggesting that this phosphorylation is critical for the autoinhibition of the motor. Conversely, phospho-dead mutants of these sites hyperactivate OSM-3 motility in vitro and affect the localization of OSM3 in C. elegans. The authors also showed that Alanine to Tyrosine mutation of one of the phosphorylation rescues OS-3 function in live worms.

      Weaknesses:

      Collectively, this study presents evidence for the physiological role of OSM-3 elbow phosphorylation in its autoregulation, which affects ciliary localization and function of this motor. Overall, the work is well performed, and the results mostly support the conclusions of this manuscript. However, the work will benefit from additional experiments to further support conclusions and rule out alternative explanations, filling some logical gaps with new experimental evidence and in-text clarifications, and improving writing before I can recommend publication.

      We appreciate Reviewer #1’s comments and suggestions. We have now provided additional evidences and discussions to further support our conclusions and fill the logical gaps. We have also provided alternative explanations to our data and improved writing.

      Reviewer #2 (Public review):

      Summary:

      The regulation of kinesin is fundamental to cellular morphogenesis. Previously, it has been shown that OSM-3, a kinesin required for intraflagellar transport (IFT), is regulated by autoinhibition. However, it remains totally elusive how the autoinhibition of OSM-3 is released. In this study, the authors have shown that NEKL-3 phosphorylates OSM-3 and releases its autoinhibition.

      The authors found NEKL-3 directly phosphorylates OSM-3 (although the method is not described clearly) (Figure 1). The phophorylated residue is the "elbow" of OSM-3. The authors introduced phospho-dead (PD) and phospho-mimic (PM) mutations by genome editing and found that the OSM-3(PD) protein does not form cilia, and instead, accumulates to the axonal tips. The phenotype is similar to another constitutive active mutant of OSM-3, OSM-3(G444A) (Imanishi et al., 2006; Xie et al., 2024). osm-3(PM) has shorter cilia, which resembles with loss of function mutants of osm-3 (Figure 3). The authors did structural prediction and showed that G444E and PD mutations change the conformation of OSM-3 protein (Figure 3). In the single-molecule assays G444E and PD mutations exhibited increased landing rate (Figure 4). By unbiased genetic screening, the authors identified a suppressor mutant of osm-3(PD), in which A489T occurs. The result confirms the importance of this residue. Based on these results, the authors suggest that NEKL-3 induces phosphorylation of the elbow domain and inactivates OSM-3 motor when the motor is synthesized in the cell body. This regulation is essential for proper cilia formation.

      Strengths:

      The finding is interesting and gives new insight into how the IFT motor is regulated.

      Weaknesses:

      The methods section has not presented sufficient information to reproduce this study.

      We appreciate that Reviewer #2 is also positive to our study. We have now provided sufficient information in the revised Methods section.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major Concerns

      (1) Why do the authors think that NEKL-3 phosphorylates OSM-3 in the first place? This seems to come out of nowhere and prior evidence indicating that NEKL-3 may be phosphorylating OSM-3 is not even mentioned in the Introduction.

      We thank the Reviewer for raising this important point. Our hypothesis that NEKL-3 phosphorylates OSM-3 stems from prior findings in our lab. In a previous study (Yi et al., Traffic, 2018, PMID: 29655266), we identified NEKL-4, a member of the NIMA kinase family, as a suppressor of the OSM-3(G444E) hyperactive mutation. This discovery prompted us to explore the broader role of NIMA kinases in regulating OSM3. Subsequent genetic screens (Xie et al., EMBO J, 2024, PMID: 38806659) revealed that both NEKL-3 and NEKL-4 suppress multiple OSM-3 mutations, further supporting their functional interaction. Given the established role of NIMA kinases in phosphorylation-dependent processes (Fry et al., JCS, 2012, PMID: 23132929; Chivukula et al., Nat. Med., 2020, PMID: 31959991; Thiel, C. et al. Am. J. Hum. Genet. 2011, PMID: 21211617; Smith, L. A. et al., J. Am. Soc. Nephrol., 2006, PMID: 16928806), we hypothesized that NEKL-3/4 may directly phosphorylate OSM-3 to modulate its activity.

      To test this hypothesis, we expressed recombinant C. elegans NEKL-3 and OSM-3 proteins and conducted in vitro phosphorylation assays. While we were unable to obtain active recombinant NEKL-4 (limitations noted in the revised text), our experiments with NEKL-3 revealed phosphorylation at residues 487-490 (YSTT motif) in OSM-3’s tail region, as confirmed by mass spectrometry. These findings are now explicitly contextualized in the Introduction and Results sections of the revised manuscript.

      Page #4, Line #11:

      “...In our previous study (Yi et al., Traffic, 2018, PMID: 29655266), a genetic screen targeting the OSM-3(G444E) hyperactive mutation identified NEKL-4, a member of the NIMA kinase family, as a suppressor of this phenotype. This finding, combined with reports that NIMA kinases regulate ciliary processes independently of their canonical mitotic roles (Fry et al., JCS, 2012, PMID: 23132929; Chivukula et al., Nat. Med., 2020, PMID: 31959991; Thiel, C. et al. Am. J. Hum. Genet. 2011, PMID: 21211617; Smith, L. A. et al., J. Am. Soc. Nephrol., 2006, PMID: 16928806), prompted us to investigate whether NIMA kinases modulate OSM-3-driven intraflagellar transport. We hypothesized that NEKL-3/4, as paralogs within this family, might directly phosphorylate OSM-3 to regulate its motility...”

      Page #4, line #26:  

      “... To determine whether NIMA kinase family members could directly phosphorylate

      OSM-3, we purified prokaryotic recombinant C. elegans NEKL-3/NEKL-4 and OSM3 protein in order to perform in vitro phosphorylation assays. We were able to obtain active recombinant NEKL-3 but not NEKL-4. The in vitro phosphorylation assays showed that NEKL-3, directly phosphorylates OSM-3 (Fig. 1A-B, Appendix Table S1). Subsequent mass spectrometric analysis revealed phosphorylation at residues 487-490, which localize to the conserved "YSTT" motif within OSM-3’s C-terminal tail region ...”

      (2) The authors need to characterize the proteins they expressed and purified for in vitro ATPase and motility assays. Are these proteins monomers or dimers?

      For our in vitro ATPase and motility assays, OSM-3 was expressed in E. coli BL21(DE3) and purified using established protocols (Xie et al., EMBO J, 2024, PMID: 38806659; Imanishi et al., JCB, 2006, PMID: 17000874). To confirm its oligomeric state, we analyzed recombinant OSM-3 by size-exclusion chromatography coupled with multiangle light scattering (SEC-MALS). As reported in Xie et al. (2024), OSM-3 (~80 kDa monomer) elutes with a molecular weight of 173–193 kDa under physiological buffer conditions, consistent with a homodimeric assembly. These findings confirm that the functional unit used in our assays is the biologically relevant dimer. This characterization has been added to the revised manuscript on Page #35, Line #7.

      “…OSM-3 was expressed in E. coli BL21(DE3) and purified for in vitro assays using established protocols (REFs). Size-exclusion chromatography coupled with multiangle light scattering (SEC-MALS) (Xie et al., EMBO J., 2024) confirmed that recombinant OSM-3 forms a homodimer (173–193 kDa) under physiological conditions, ensuring its dimeric state remained intact....” 

      (3) The authors primarily used PD and PM mutations, which affect all four amino acids in the region. This may or may not be physiologically relevant. Figure 5 indicates that T489 is a critical regulatory site. However, this conclusion is undermined by reliance on PD mutations, which affect all four amino acids. Creating PM (T489E) and PD (T489A) mutations based on WT OSM-3 would better reflect physiological relevance. In vitro assays with a single phosphomimic or phosphor-dead mutation at residue 489 are missing at the end of this story. This would better link Figure 5 with the rest of the manuscript.

      We thank the reviewer for this constructive critique. Below, we address the concerns and integrate new data to strengthen the link between T489 and autoinhibition:

      To probe the regulatory role of T489 phosphorylation, we generated osm-3(T489E) (phosphomimetic, PM) and osm-3(T489A) (phospho-dead, PD) mutant animals. Strikingly, both mutants formed axonal puncta (Figure S7), recapitulating the hyperactive phenotype of the OSM-3G444E mutant. While the similar puncta formation in PM and PD mutants initially appeared paradoxical, this observation underscores the necessity of dynamic phosphorylation cycling at T489 for proper autoinhibition. Specifically, the PD mutant (T489A) likely disrupts phosphorylationdependent autoinhibition stabilization, leading to constitutive activation, where as the PM mutant (T489E) may mimic a "locked" phosphorylated state, preventing dephosphorylation-dependent release of autoinhibition in cilia and trapping OSM-3 in an aggregation-prone conformation. These results highlight T489 as a structural linchpin whose post-translational modification dynamically regulates motor activity. While the precise molecular mechanism—such as how phosphorylation modulates tailmotor domain interactions—remains to be elucidated, our data conclusively demonstrate that perturbing T489 (even in isolation) destabilizes autoinhibition, driving puncta formation and the constitutive activity.

      We have integrated the above paragraph in the revised manuscript on page #8, line #27.

      (4) There seems to be a disconnect between the MT gliding assays in Figure 4C and single molecule motility assays in Figure 4E. The gliding assays show that all constructs can glide microtubules at near WT speeds. Yet, the motility assays show that WT and PM cannot land or walk on MTs. The authors need to explain why this is the case. Is this because surface immobilization of kinesin from its tail disrupts autoinhibition? Alternatively, the protein preparation may include monomers that cannot be autoinhibited and cannot land and processively walk on surface-immobilized microtubules (because they only have one motor domain) but can glide microtubules when immobilized on the surface from their tail.

      The surface immobilization of OSM-3 via its tail domain disrupts autoinhibition, a phenomenon previously observed in other kinesins such as kinesin-1 (Nitzsche et al, Methods Cell Biol., 2010, PMID: 20466139). In our assays, OSM-3 was nonspecifically immobilized on glass surfaces, enabling microtubule gliding by motors whose autoinhibition was relieved through tail anchoring. Critically, the PD and PM mutations reside in the tail region and do not alter the intrinsic properties of the motor head domain. Consequently, once autoinhibition is released via immobilization, the gliding velocities reflect the conserved motor head activity, which is expected to remain comparable across all constructs. While we cannot entirely rule out the presence of monomeric OSM-3 in solution, several lines of evidence argue against this possibility. First, the mutations are located in the elbow region, which is dispensable for motor dimerization. Second, SEC-MALS analysis from prior studies confirms that purified OSM-3 exists predominantly as dimers in solution. 

      We have discussed these issues in the revised text on page #10, line #18: 

      “…In our gliding assays, OSM-3PM has an increased gliding speed of 0.69 ± 0.07 μm/s (Fig. 4 C-D), similar to PD mutant. PD and PM mutations are confined to the elbow region, leaving the motor head’s mechanochemical properties intact. Upon tail immobilization—which releases autoinhibition—the gliding speeds reflect motor head activity. Single-molecule assays, however, directly resolve their native regulatory states: PD mutants are constitutively active, whereas PM mutants persist in an autoinhibited state (Fig. 4E-G). Although monomeric OSM-3 could theoretically mediate singlemotor gliding, the previous SEC-MALS data demonstrate that OSM-3 purifies as stable dimers (Xie et al., EMBO J, 2024, PMID: 38806659). Thus, dimeric OSM-3 is perhaps the predominant functional species in our assays…”

      (5) An alternative explanation for the data is that both PD and PM mutations result in loss-of-function effects, disrupting OSM-3 activity. For instance:

      a) In Figure 2C, both mutations cause shorter cilia than the wild type (WT).

      b) In Figure 4A, both mutations result in higher ATPase activity than WT.

      c) In Figure 4D, both mutations show increased gliding velocity compared to WT. These results suggest the observed effects could stem from loss of function rather than phosphorylation-specific regulation.

      Although PD and PM mutations exhibit superficially similar "loss-of-function" phenotypes in certain assays, they mechanistically disrupt motor regulation in distinct ways:

      a) Ciliary Length (Figure 2C) PD Mutants: Hyperactivation causes OSM-3-PD to prematurely aggregate into axonal puncta, preventing ciliary entry. Consequently, cilia are built solely by the weaker Kinesin-II motor, which only constructs shorter middle segments.

      PM Mutants: OSM-3-PM retains autoinhibition during transport (enabling ciliary entry) but cannot be dephosphorylated in cilia. This blocks activation, leaving OSM-3-PM partially functional and resulting in cilia intermediate in length between WT and PD.

      We have discussed this issue in the revised text on page #5, line #30:

      “…These findings indicate that OSM-3-PM is in an autoinhibited state capable of ciliary delivery, yet fails to achieve full activation due to defective dephosphorylation. This incomplete activation results in suboptimal motor function and intermediate ciliary length phenotypes (Fig.2 B-C). In contrast, OSM-3-PD exhibits constitutive activation leading to aggregation into axonal puncta, which completely abolishes its ciliary entry capacity (Fig.2 A-B)...”

      b) ATPase Activity (Figure 4A)

      PD Mutants: Fully autoinhibition-released (98.15% of KHC ATPase activity), consistent with constitutive activation.

      PM Mutants: Show partial ATPase activity (34.28% of KHC), reflecting imperfect phosphomimicry. While the DDEE substitution introduces negative charges, it fails to fully replicate the steric/kinetic effects of phosphorylated tyrosine (Y486; phenyl ring absent), resulting in incomplete autoinhibition stabilization. Despite this, the residual inhibition is sufficient to phenocopy shorter cilia in vivo.

      We have discussed this issue in the revised text on page #7, line#19:

      “…The PM mutant’s partial ATPase activity (34.28% of KHC) might arise from imperfect phosphomimicry—while the DDEE substitution introduces negative charges, it lacks the steric bulk of phosphorylated tyrosine (pY487). And this incomplete mimicry allows residual autoinhibition, sufficient to limit ciliary construction in vivo...”

      c) Microtubule Gliding Velocity (Figure 4D)

      Gliding Assay Limitation: Tail immobilization artificially releases autoinhibition, masking regulatory differences. Thus, all constructs (PD, PM) exhibit similar velocities (~0.7 µm/s), reflecting conserved motor head activity.

      Single-Molecule Assay (Figure 4E): Directly resolves native autoinhibition states:

      PD mutants show robust motility (autoinhibition released).

      PM mutants remain largely inactive (autoinhibition retained).

      We have discussed this issue in the revised text on page #10, line#18:

      “…In our gliding assays, OSM-3PM has an increased gliding speed of 0.69 ± 0.07 μm/s (Fig. 4 C-D), similar to PD mutant. PD and PM mutations are confined to the elbow region, leaving the motor head’s mechanochemical properties intact. Upon tail immobilization—which releases autoinhibition—the gliding speeds reflect motor head activity. Single-molecule assays, however, directly resolve their native regulatory states: PD mutants are constitutively active, whereas PM mutants persist in an autoinhibited state (Fig. 4E-G)...”

      Minor Suggestions and Concerns

      (1) Lines 60-66: References that support these observations are missing from this section.

      We have added the relevant references.

      (2) Lines 66-67: I would revise this sentence as "It remains unclear how OSM-3 becomes enriched...".

      We have made the changes.

      (3) Line 85: The authors should describe how they perform these assays (i.e. recombinantly expressed NEKL-3 and OSM-3, are these C. elegans proteins, and which expression system was used...).

      We have described them in the main text and methods

      Page #4 line #26

      “...To determine whether NIMA kinase family members could directly phosphorylate OSM-3, we purified prokaryotic recombinant C. elegans NEKL-3/NEKL-4 and OSM-3 protein in order to perform in vitro phosphorylation assays...”

      Page #35 line#12

      “...Basically, point mutations was introduced in to pET.M.3C OSM-3-eGFP-His6 plasmid for prokaryotic expression. Plasmid transformed E. coli (BL21) was cultured at 37°C and induced overnight at 23°C with 0.2 mM IPTG. Cells were lysed in lysis buffer (50 mM NaPO4 pH8.0, 250 mM NaCl, 20 mM imidazole, 10 mM bME, 0.5 mM ATP, 1 mM MgCl¬2, Complete Protease Inhibitor Cocktail (Roche)) and Ni-NTA beads were applied for affinity purification. After incubation, beads were washed with wash buffer (50 mM NaPO4 pH6.0, 250 mM NaCl, 10 mM bME, 0.1 mM ATP, 1 mM MgCl¬2) and eluted with elute buffer (50 mM NaPO4 pH7.2, 250 mM NaCl, 500 mM imidazole, 10 mM bME, 0.1 mM ATP, 1 mM MgCl¬2). Protein concentration was determined by standard Bradford assay. C elegans nekl-3 cDNA was cloned in to pGEX-6P GST vector and expressed in E. coli BL21 (DE3) and purified for in vitro phosphorylation assays. Plasmid transformed E. coli (BL21) was cultured at 37°C and induced overnight at 18°C with 0.5 mM IPTG. Cells were lysed in lysis buffer (50 mM NaPO4 pH8.0, 250 mM NaCl, 1 mM DTT, Complete Protease Inhibitor Cocktail (Roche)) and GST beads were applied for affinity purification. After incubation, beads were washed with wash buffer (50 mM NaPO4 pH6.0, 250 mM NaCl, 1 mM DTT) and eluted with elute buffer (50 mM NaPO4 pH7.2, 150 mM NaCl, 10 mM GSH, 1 mM DTT). Purified proteins were dialyzed against storge buffer (50 mM Tris-HCl, pH 8.0, 150 mM NaCl). Protein concentration was determined by standard Bradford assay...”

      (4) Line 141: The first sentence of this paragraph lacks motivation. I would start this sentence with "To directly observe the effects of phosphor mutants in the elbow region in microtubule binding and motility of OSM-3, we...".

      We have made the change.

      (5) Figure 1B: The mass spectrometry data in Figure 1B lacks adequate explanation. The Methods section should detail the experimental protocol, data interpretation, and any databases used. Additionally, the manuscript should list all identified phosphorylation sites on OSM-3 to provide context, including whether Y487_T490 is the major site.

      We have provided the detailed experimental protocol, data interpretation, and databases used in methods. We have provided all identified sites as Appendix table S1.

      (6) Figure 1C: Is it possible to model the effect of PM and PD mutations using AlphaFold? The authors should also show PAE or pLDDT scores of their model.

      AlphaFold cannot well model the effect of mutants, but we conducted the Rosetta relax to capture their possible conformational changes, as shown in the revised Figure 3. We have provided PAE and pLDDT as a new figure, Figure S2.

      (7) Figure 2D: The unit for speed should use a lowercase "s" for seconds.

      We have fixed it.

      (8) Figure 3: I am not sure whether this figure stands for a main text figure on its own, as it is only a Rosetta prediction and is not supported by any experimental data. In addition, it remains unclear what the labels on the x-axis mean.

      We have updated the figure and explain the labels on the x-axis in Figure S4 to make it more reader-friendly.

      (9) Figure 4: NEKL-3-treated OSM-1 should be included as a positive control in the in vitro experiments.

      We suspect that the Reviewer asked for NEKL-3-treated OSM-3. 

      In our other study which has just been accepted by the Journal of Cell Biology, NEKL3-treated OSM-3 significantly reduced the affinity between OSM-3 motor and microtubules and showed very low ATPase activity. We have cited and discussed this in the revised text on page #10, line #28: 

      “…As demonstrated in our recent study (Huang et al., JCB, 2025, In press, attached), phosphorylation of OSM-3 by NEKL-3 at two distinct regions—Ser96 and the conserved "elbow" motif—differentially regulates its activity and localization. Phosphorylation at Ser96 reduces OSM-3’s ATPase activity and alters its ciliary distribution from the distal segment to a uniform localization, while elbow phosphorylation induces autoinhibition, retaining OSM-3 in the cell body. Strikingly, in vitro phosphorylation of OSM-3 by NEKL-3 significantly reduces its microtubulebinding affinity, likely arising from combined modifications at both sites. We propose a model wherein elbow phosphorylation ensures anterograde ciliary transport, while Ser96 phosphorylation fine-tunes distal segment targeting. This multistep regulation may involve distinct phosphatases to reverse phosphorylation at specific sites, a hypothesis warranting further investigation….”

      (10) Figure 4C, D, and F: The unit of velocity is wrong. The authors should use the same units they used in the table shown in Figure 4B.

      We have fixed these errors

      (11) Figure 4F: The velocity of PD is a lot lower than G444E. Therefore, it would be more appropriate to refer to PD as partially active, rather than hyperactive.

      We have made the change. 

      (12) Figure 5: There is too much genetics jargon on this figure (EMF, F2, 100%Dyf,...). How are the alleles numbered? Is it OK to refer to them as Alleles 1 and 2 for simplicity?

      According to the established C. elegans allele nomenclature, each worm allele has a unique number named after the lab code for identification. We have simplified the labels and updated the figure to make it more reader-friendly.

      (13) Figure 5E: A plot would be more reader-friendly than a table. Additionally, the legend for Fig. 5E mistakenly refers to it as "D."

      We have changed the table to a plot and fixed the mistakes. We thank the Reviewer for pointing them out.

      Reviewer #2 (Recommendations for the authors):

      (1) The model appears as if NEKL-3 induces dephosphorylation of OSM-3 (Figure 6). This is not consistent with the conclusions described in the Discussion and is confusing.

      We have updated the model figure and fixed the error.

      (2) It should be described why the authors hypothesized NEKL-3 phosphorylates OSM3. Was there genetic evidence? Did the authors screened cilia-related kinases? or Did the authors identify it incidentally? Providing this information would help readers to understand the context of the research.

      We appreciate both Reviewers for pointing out this issue. 

      Our hypothesis that NEKL-3 phosphorylates OSM-3 stems from prior findings in our lab. In a previous study (Yi et al., Traffic, 2018, PMID: 29655266), we identified NEKL-4, a member of the NIMA kinase family, as a suppressor of the OSM-3(G444E) hyperactive mutation. This discovery prompted us to explore the broader role of NIMA kinases in regulating OSM-3. Subsequent genetic screens (Xie et al., EMBO J, 2024, PMID: 38806659) revealed that both NEKL-3 and NEKL-4 suppress multiple OSM-3 mutations, further supporting their functional interaction. Given the established role of NIMA kinases in phosphorylation-dependent processes (Fry et al., JCS, 2012, PMID: 23132929; Chivukula et al., Nat. Med., 2020, PMID: 31959991; Thiel, C. et al. Am. J. Hum. Genet. 2011, PMID: 21211617; Smith, L. A. et al., J. Am. Soc. Nephrol., 2006, PMID: 16928806), we hypothesized that NEKL-3/4 may directly phosphorylate OSM3 to modulate its activity.

      To test this hypothesis, we expressed recombinant C. elegans NEKL-3 and OSM-3 proteins and conducted in vitro phosphorylation assays. While we were unable to obtain active recombinant NEKL-4 (limitations noted in the revised text), our experiments with NEKL-3 revealed phosphorylation at residues 487-490 (YSTT motif) in OSM-3’s tail region, as confirmed by mass spectrometry. These findings are now explicitly contextualized in the Introduction and Results sections of the revised manuscript.

      Page #4, Line #11:

      “... In our previous study (Yi et al., Traffic, 2018, PMID: 29655266), a genetic screen targeting the OSM-3(G444E) hyperactive mutation identified NEKL-4, a member of the NIMA kinase family, as a suppressor of this phenotype. This finding, combined with reports that NIMA kinases regulate ciliary processes independently of their canonical mitotic roles (Fry et al., JCS, 2012, PMID: 23132929; Chivukula et al., Nat. Med., 2020, PMID: 31959991; Thiel, C. et al. Am. J. Hum. Genet. 2011, PMID: 21211617; Smith, L. A. et al., J. Am. Soc. Nephrol., 2006, PMID: 16928806), prompted us to investigate whether NIMA kinases modulate OSM-3-driven intraflagellar transport. We hypothesized that NEKL-3/4, as paralogs within this family, might directly phosphorylate OSM-3 to regulate its motility...”

      Page #4, line #26: 

      “... To determine whether NIMA kinase family members could directly phosphorylate OSM-3, we purified prokaryotic recombinant C. elegans NEKL-3/NEKL-4 and OSM3 protein in order to perform in vitro phosphorylation assays. We were able to obtain active recombinant NEKL-3 but not NEKL-4. The in vitro phosphorylation assays showed that NEKL-3, directly phosphorylates OSM-3 (Fig. 1A-B, Appendix Table S1). Subsequent mass spectrometric analysis revealed phosphorylation at residues 487-490, which localize to the conserved "YSTT" motif within OSM-3’s C-terminal tail region...”

      (3) It is curious the authors have not addressed the cilia phenotype and the localization of OSM-3 in nekl-3 mutant. Regardless of whether these observations agrees with the proposed mechanisms, it is essential for the authors to show and discuss the cilia phenotype and OSM-3 localization in nekl-3 mutants.

      We thank the Reviewer for highlighting this critical point. Indeed, nekl-3 null mutants are inviable due to essential mitotic roles (Barstead et al., 2012, PMID: 23173093), precluding direct analysis of ciliary phenotypes. To bypass this limitation, we recently generated nekl-3 conditional knockouts (cKOs) in ciliated neurons (Huang et al., JCB, 2025 in press, attached). In these mutants, OSM-3—which is normally enriched in the ciliary distal segment—becomes uniformly distributed along the cilium. This redistribution correlates with premature activation of OSM-3-driven anterograde motility in the ciliary middle region, consistent with our proposed model where NEKL3 phosphorylation suppresses OSM-3 activity. We have now integrated this result and discussion into the revised manuscript, reinforcing the physiological relevance of NEKL-3-mediated regulation in ciliary transport. 

      Page #6 line #10

      “… While nekl-3 null mutants are inviable due to essential mitotic roles (Barstead et al., 2012, PMID: 23173093), conditional knockout (cKO) of nekl-3 in ciliated neurons (Huang et al., JCB, 2025 in press, attached) revealed its critical role in regulating OSM3 dynamics. In nekl-3 cKO animals, OSM-3—normally enriched in the ciliary distal segment—redistributed uniformly along the cilium, concomitant with premature activation of anterograde motility in the middle ciliary region. This phenotype aligns with our model wherein NEKL-3 phosphorylation suppresses OSM-3 activity, ensuring spatiotemporal regulation of IFT.…”

      (4) The methods section lacks some information, which is critical to reproducing this study.

      We have now provided detailed information in the methods section in the revised manuscript.

      (a) It is not described how the authors determined phosphorylation of OSM-3 by NEKL-3. In methods, nothing is described about the assay.

      We performed in vitro phosphorylation assays using recombinant OSM-3 and NEKL3 purified from bacteria. We then used LC-MS/MS for identification of phosphorylation sites. We have now updated the methods section to include all the information.

      Page #4 line #26

      “... To determine whether NIMA kinase family members could directly phosphorylate OSM-3, we purified prokaryotic recombinant C. elegans NEKL-3/NEKL-4 and OSM3 protein in order to perform in vitro phosphorylation assays. We were able to obtain active recombinant NEKL-3 but not NEKL-4. The in vitro phosphorylation assays showed that NEKL-3, directly phosphorylates OSM-3 (Fig. 1A-B, Appendix Table S1). Subsequent mass spectrometric analysis revealed phosphorylation at residues 487-490, which localize to the conserved "YSTT" motif within OSM-3’s C-terminal tail region...”

      Page #36, line #19

      “In vitro phosphorylation assay 20 μM purified OSM-3 was incubated with 1 μM GST-NEKL-3 at 30 °C in 100 μL reaction buffer (50 mM Tris-HCl pH 8.0, 10 mM MgCl2, 150 mM NaCl, and 2 mM ATP) for 30 min. The reaction was terminated by boiling for 5 min with an SDS-sample buffer.

      Mass spectrometry

      Following NEKL-3 treatment, OSM-3 proteins were resolved by SDS-PAGE and visualized with Coomassie Brilliant Blue staining. Protein bands corresponding to OSM-3 were excised and subjected to digestion using the following protocol: reduction with 5 mM TCEP at 56°C for 30 min; alkylation with 10 mM iodoacetamide in darkness for 45 min at room temperature, and tryptic digestion at 37°C overnight with a 1:20 enzyme-to-protein ratio. The resulting peptides were subjected to mass spectrometry analysis. Briefly, the peptides were analyzed using an UltiMate 3000 RSLCnano system coupled to an Orbitrap Fusion Lumos mass spectrometer (Thermo Fisher Scientific). We applied an in-house proteome discovery searching algorithm to search the MS/MS data against the C. elegans database. Phosphorylation sites were determined using PhosphoRS algorithm with manual validation of MS/MS spectra.”

      (b) The method of structural prediction by Alfafold2 and LocalColabFold needs clarification. In general, the prediction gives several candidates. How did the authors choose one of these candidates?

      We generated five candidate models and all of them showed similar conformation. We thus chose the model with the highest confidence. We have provided PAE and pLDDT as additional data in Figure S2 and discussed them in the revised text on, Page #4, line #32: 

      “...To gain structural insights from this motif, we employed LocalColabFold based on AlphaFold2 to predict the dimeric structure of OSM-3 (Evans et al., 2022; Jumper et al., 2021; Mirdita et al., 2022). The highest-confidence model was selected for further analysis (Fig. 1C, Fig. S2)...”

      (c) The methods to predict conformational changes by introducing various point mutations are interesting (Figure 3). However, the methods require more detailed descriptions. In the current form, the manuscript only lists the tools used. The pipelines and parameters need to be described. This information is important because AlphaFoldbased predictions often give folded conformations because the training data are mainly composed of folded proteins. It is surprising that the methods applied here give open conformations induced by point mutations.

      We have described the pipelines in the revised Methods section on page#34, line#25: 

      “…OSM-3 model was predicted using LocalColabFold (Evans et al., 2022; Jumper et al., 2021; Mirdita et al., 2022). Mutated proteins were designed by Pymol 2.6, choosing the rotamer of the mutated residues in G444E, PM and PD models with the least clash as the initial conformation. To predict mutation-induced conformational changes, the initial models were subjected to Pyrosetta (Chaudhury et al., 2010). The energies of pre-relaxed models were evaluated with Rosetta Energy Function 2015 (Alford et al., 2017), and then the relax procedure were applied to the models with default parameters to obtain the relaxed models visualized by Pymol to minimize the energy of these models. In detail, to obtain the relaxed models visualized by Pymol and minimize the energy of these models, the classic relax mover was used in the procedure mentioned above with default settings. The relax script has been uploaded to Github: https://github.com/young55775/RosettaRelax_for_OSM3...”

      (5) The authors have purified proteins. Do they show different properties in gel filtration that are consistent with the structural prediction? It is anticipated that open-form mutants are eluted from earlier than closed forms.

      We thank the reviewer for this insightful suggestion. Indeed, our recent study supported that the open-from of the active OSM-3 G444E mutation were eluted earlier than the wild-type closed form (Xie et al., EMBO J., 2024). While the current study did not perform gel filtration chromatography (SEC) to directly compare the hydrodynamic properties of the OSM-3 mutants, our functional assays provide robust evidence for conformational changes predicted by structural modeling. For example: ATPase activity assays revealed that the open-state mutants (e.g., G444E and PD muatnts) exhibited significantly enhanced enzymatic activity (Figure 4A), consistent with structural predictions of an active, destabilized autoinhibitory interface (Figure 3A). These functional readouts collectively validate the predicted structural states. While SEC could further corroborate these findings by distinguishing compact (closed) versus extended (open) conformations, we prioritized assays that directly link structural predictions to in vitro enzymatic activity and in vivo ciliary transport dynamics. Future studies incorporating SEC or cryo-EM will provide additional biophysical validation of these states.

      We have revised the text in the manuscript (Page #7, Lines #22): 

      “…Notably, the open-state OSM-3 mutants (e.g., G444E) displayed elevated ATPase activity, consistent with structural predictions of autoinhibition release (Fig. 3A, Fig. 4A) (Xie et al., 2024). While hydrodynamic profiling (e.g., SEC) could further resolve conformational states, our functional assays directly connect predicted structural changes to altered biochemical and cellular activity...”

      Minor point

      (1) Line 85 "MIMA kinase family" should be "NIMA kinase family".

      We have corrected the typo and appreciate that the Reviewer for pointing it out. 

      (2) M.S. and D.S. need to be defined in Figure 2D.

      We have updated the figures.

    1. Author Response

      The following is the authors’ response to the current reviews.

      1) The main issue relates to Set2, and how STIM1 expression rescues Set2-dependent functions in Set2 KO flies. If Set2 is downstream of STIM1, how would STIM1 over-expression rescue a Set2-dependent effect?

      STIM rescue is of Set2 knockdown (RNAi) and NOT Set2 Knockout flies. Over expression of STIM raises SOCE in primary cultures of Drosophila neurons (as demonstrated in previous publications from our group: Agrawal et al., 2010; Chakraborty et al, 2016; Deb et al., 2016). The higher SOCE drives greater expression of Set2 from the endogenous locus thus reducing the efficacy of Set2 RNAi. Hence the rescue by STIM of Set2 KD flies in Figure S2E. We have explained this in lines 227-234.

      2) There is still no characterization of SOCE in fpDANs from flies expressing native Orai or the dominant negative OraiE180A mutant.

      Measurement of SOCE is not technically feasible in ex-vivo preps due to the presence of extracellular calcium in the brain milieu. In the past we have measured SOCE from primary cultures of central dopaminergic neurons expressing either native Orai OR OraiE180A mutant (Pathak et al., 2015) where we found that all dopaminergic neurons expressing OraiE180A exhibit very low SOCE. This is the reason we have not measured SOCE in the fewer cells of the fpDAN subset marked by THD' GAL4. This point has been specifically mentioned and explained in the section on “limitations of the study” at the end of the manuscript.

      3) The revised version does not include an analysis of the STIM:Orai stoichiometry, which has been demonstrated to be essential for SOCE.

      To measure such stoichiometry we would need to perform direct measurements of STIM and Orai levels by protein extraction from the fpDANs of all appropriate genotypes. This is not feasible due to the small number of cells available from each brain.

      I confirm that there are no changes to the text OR figures from the previous version of the manuscript.


      The following is the authors’ response to the original reviews.

      […]

      The manuscript by Mitra and coworkers analyses the functional role of Orai in the excitability of central dopaminergic neurons in Drosophila. The authors show that a dominant-negative mutant of Orai (OraiE180A) significantly alters the gene expression profile of flight-promoting dopaminergic neurons (fpDANs). Among them, OraiE180A attenuates the expression of Set2 and enhances that of E(z) shifting the level of epigenetic signatures that modulate gene expression. The present results also demonstrate that Set2 expression via Orai involves the transcription factor Trl. The Orai-Trl-Set1 pathway modulates the expression of VGCC, which, in turn, are involved in dopamine release. The topic investigated is interesting and timely and the study is carefully performed and technically sound; however, there are several major concerns that need to be addressed:

      1) In Figure S2E, STIM is overexpressed in the absence of Set2 and this leads to rescue. It is presumed that STIM overexpression causes excess SOCE, yet this is rarely the case. Perhaps the bigger concern, however, is how excess SOCE might overcome the loss of SET2 if SET2 mediates SOCE-induced development of flight. These data are more consistent with something other than SET2 mediating this function.

      Our statement that STIM overexpression overcomes deficits in SOCE is based on the following published work, which has been highlighted in the revised version of the manuscript (see Lines 226-233):

      1. Studies of SOCE in wildtype cultured larval Drosophila neurons demonstrated that overexpression of STIM raised SOCE to the same extent as co-expression of STIM and Orai in the WT background (Chakraborty et al, 2016; Figure 1D).

      2. Both Carbachol-induced IP3-mediated Ca2+ release and SOCE (measured by Ca2+ add back after Thapsigargin-induced store depletion) were rescued in primary cultures of IP3R hypomorphic mutant (itprku) Drosophila neurons by overexpression of STIM (Agrawal et al., 2010; Figure 8A-G).

      3. Deb et al., 2016 (Supplementary Figure 2h,i) reaffirmed that overexpression of STIM significantly improves SOCE after Thapsigargin-induced passive store-depletion in Drosophila neurons expressing IP3RRNAi.

      4. Consistent with the cellular rescue of SOCE, defects in flight initiation and physiology observed in the heteroallelic IP3R hypomorphic background (itprku) could be rescued by overexpression of STIM (Agrawal et al., 2010; Figure 3A-E) as well as Orai (Venkiteswaran and Hasan, 2009; Figure 3).

      5. In Figure S2E, we show that flight deficits arising from THD’> Set2RNAi are rescued upon overexpression of STIM (i.e. THD’>Set2RNAi; STIMOE). Here and in another recent publication (Mitra et al., 2021) we show that neurons expressing Set2RNAi exhibit reduced expression of the IP3R and reduced ER-Ca2+ release presumably leading to reduced SOCE. As mentioned above we have consistently found that STIM overexpression raises both IP3-mediated Ca2+ release and SOCE in Drosophila neurons.

      In this study, we propose that Ca2+ release through the IP3R followed by SOCE are part of a positive feedback loop (described in the revised manuscript- see Lines 302-307) driving expression of Set2 which in turn upregulates expression of mAChR and IP3R (Figure 3F) to regulate dopaminergic neuron function. Our observation that loss of Set2 (THD’>Set2RNAi) can be rescued by STIM overexpression is consistent with this model because:

      1. Loss of Set2 (THD’>Set2RNAi) results in downregulation of several genes including mAChR and IP3R leading to decreased SOCE.

      2. As evident from our previous studies increased STIM expression in the Set2RNAi background (THD’>Set2RNAi; STIMOE) is expected to enhance SOCE which we predict would rescue Set2 expression leading to rescue of other Set2 dependent downstream functions like flight (Figure 2D).

      2) In Figure 3, data is provided linking SET2 expression and Cch-induced Ca2+ responses. The presentation of these data is confusing. In addition, the results may be a simple side effect of SET2-dependent expression of IP3R. Given that this article is about SOCE, why isn't SOCE shown here? More generally, there are no measurements of SOCE in this entire article. Measuring SOCE (not what is measured in response to Cch) could help eliminate some of this confusion.

      This section has been re-written in the revised version for better clarity and we have explained how Set2-dependent IP3R expression is an important component of Orai-mediated Ca2+ entry in fpDANs (see Lines 302-307). Here, we propose that IP3-mediated Ca2+ release and SOCE, through Orai, are together part of a positive feedback loop (see Lines 286-307) driving transcription of Set2 which in turn upregulates mAChR and IP3R expression (Figure 3F). We hypothesized that the observed loss of CCh-induced Ca2+ response in the Set2RNAi background (Figure 3B-D; THD’>Set2RNAi) results from decreased itpr and mAChR expression and verified this in Figure 3E. This is further validated by the rescue of CCh-induced Ca2+ response and itpr/mAChR expression in the OraiE180A background upon Set2 overexpression (Figure 3B-E; THD’>OraiE180A; Set2OE). We were constrained to measure CCh-induced Ca2+ responses in OraiE180A expressing neurons for the following reasons (highlighted in the revised version of the manuscript- (See Lines 307-313; ‘Limitations of the study’-Lines 719-735):

      1. SOCE measurements through Tg mediated store Ca2+ release followed by Ca2+ add back require a 0 Ca2+ environment that can only be achieved in culture. The Drosophila brain is bathed in hemolymph which contains Ca2+ and there do not exist any methods to readily deplete Ca2+ from the tissue to create a 0 Ca2+ environment without also effecting the health of the neurons.

      2. Cultures of the subset of dopaminergic neurons (THD’) we have focused on in this study were not feasible due to the small number of neurons being studied from the total number of dopaminergic neurons in the brain (~35/400). In previous studies we have shown that SOCE post-Tg induced store depletion is abrogated in cultured dopaminergic neurons from Drosophila upon expression of OraiE180A (Pathak et al., 2015). Furthermore, Carbachol-induced IP3-mediated Ca2+ release is tightly coupled to SOCE in Drosophila neurons (Venkiteswaran and Hasan, 2009) and Ca2+ release from the IP3R is physiologically relevant for flight behavior in THD’ neurons (Sharma and Hasan, 2020).

      3) A significant gap in the study relates to the conclusion that trl is a SOCE-regulated transcription factor. This conclusion is entirely based on genetic analysis of STIMKO heterozygous flies in which a copy of the trl13C hypomorph allele is introduced. While these results suggest a genetic interaction between the expression of the two genes, the evidence that expression translates into a functional interaction that places trl immediately downstream of SOCE is not rigorous or convincing. All that can be said is that the double mutant shows a defect in flight which could arise from an interruption of the circuit. Further, it is not clear whether the trl13C hypomorph is only introduced during the critical 72-96 hour time window when the Orai1E180E phenotype shows up. The same applies to the over-expression of Set2 and the other genes. If the expression is not temporally controlled, then the phenotype could be due to the blockade of an entirely different aspect of flight neuron function.

      The idea that Trl functions downstream of Orai-mediated Ca2+ entry in THD’ neurons is based on the following genetic evidence (highlighted in the revised version; see Lines 339-341; 351-367; 647-65; ‘Limitations of the study’: 736-739)

      1. In Figure 4D, we show evidence of genetic interaction between trl-STIM and trl-Set2. The rescue of trl13c/STIMKO with STIM overexpression in THD’ neurons indicates that excess SOCE (driven by STIMOE) may activate the residual Trl (there exists a WT Trl copy in this genetic background) to rescue THD’ flight function. This is further supported by the rescue of trl/STIMKO with Set2 overexpression in THD’ neurons, which is consistent with the feedback loop model proposed in Figure 5C (see Lines 390-396) where we propose that reduced SOCE leads to reduced ‘activated’ Trl and thus reduced Set2 expression, and the latter is rescued by SET2OE . The manner in which SOCE ‘activates’ Trl is the subject of ongoing investigations.

      2. The trl hypomorphic alleles (including trl13C) exist as genetic mutants and they affect Trl function in all tissues throughout development. While we concede that these mutant alleles would affect multiple functions at other stages of development, which may impinge on the phenotypes noted in Figure S4B, we have used a targeted RNAi approach to validate Trl function specifically in the THD’ neurons (see Figure 4C; Lines 339-341).

      3. Overexpression mediated rescues (including Set2) were not induced only during the critical 72-96 hrs APF developmental window. Having established that Orai function drives critical gene expression during this window (Figure 1), it is reasonable to assume that Set2 rescue of loss of flight in OraiE180A occurs in the same time window where flight is disrupted (see Lines 221-224).

      4) In Figure 4, data is shown that SOCE compensates for the loss of Trl, the presumed mediator of SOCE-dependent flight. The fact that flight deficits are rescued by raising SOCE in the absence of Trl is very inconsistent with this conclusion.

      We apologise for this confusion and have clarified in the revision (see Lines 346-367). trl13c is a recessive allele of Trl and has been written as such throughout the text and in the figures (i.e trl13c and NOT Trl13c). In all cases of Trl mutant rescue by STIMOE and Set2OE there exists residual Trl that can be activated by excess SOCE thus leading to the rescue. This is true for trl13C/ STIMKO where each mutant is present as a heterozygote (the complete genotype of this strain is STIMKO/+; trl13c/+; this has been corrected in the revision). Similarly, for TrlRNAi we expect reduced levels (but not complete loss) of Trl. Thus the SOCE rescue of loss of Trl occurs in conditions where Trl levels are reduced but NOT absent. Homozygous trl null mutants are lethal.

      5) In Figure 5 (A-C), data is provided that Trl transcripts are unaffected by loss of SOCE and that overexpression cannot rescue flightlessness. From this, the authors conclude that this gene "must" be calcium responsive. While that is one possibility, it is also possible that these genes are not functionally linked.

      The idea that Trl is functionally linked to SOCE is based on the following evidence (included in the revised version- see Lines 339-341; 346-367; 391-396)

      1. In Figure 4C we show that flight defects caused by partial loss of Trl (THD’>TrlRNAi) were rescued by STIM overexpression (THD’>TrlRNAi; STIMOE). As mentioned above we have found that STIM overexpression raises SOCE.

      2. Heteroalleles of the trl13C hypomorph exhibit a strong genetic interaction with a single copy of the null allele of STIMKO as shown by the flight deficit of trl13c/+; STIMKO/+ (trl13C/STIMKO ) flies (Figure 4D). The genotypes will be corrected in the revision.

      3. Flight defects in trl13C/STIMKO flies could be rescued by STIM overexpression in the THD’ neurons (trl13C/STIMKO; THD’>STIMOE)

      4. In Figure 4E, we show that partial loss of Trl in THD’ neurons (THD’>TrlRNAi) leads to decreased expression of the Ca2+ responsive genes mAChR, itpr, and Set2 genes indicating that Trl is a constituent of the SOCE-driven transcriptional feedback loop (see Figure 5C).

      Since we could not detect a well-defined Ca2+ binding domain in Trl, we hypothesize that it could be activated by a Ca2+ dependent post-translational modification. Phosphoproteome analysis of Trl demonstrated that it does indeed undergo phosphorylation at a Threonine residue (T237; Zhai et al., 2008), which lies within a potential site for CaMKII. Independently, CaMKII has been identified as a binding partner of Trl from a Trl interactome study (Lomaev et al., 2018). Past work from our group (Ravi et al., 2018) identified a role for CaMKII in THD’ neurons in the context of flight. We are currently testing if CaMKII functions downstream of SOCE in THD’ neurons to mediate flight and will update this information in the next version of the manuscript.

      Now included in the revised version of the manuscript as Figure S5; Lines 397-424)

      6) There is no characterization of SOCE in fpDANs from flies expressing native Orai or the dominant negative OraiE180A mutant. While the authors refer to previous studies, as the manuscript is essentially based on Orai function thapsigargin-induced SOCE should be tested using the Ca2+ add-back protocol in order to assess the release of Ca2+ from the ER in response to thapsigargin as well as the subsequent SOCE.

      The fpDANs consist of 16-19 neurons in each hemisphere (PPL1 are 10-12 and PPM3 are 6-7 cells; Pathak et al., 2015). Measuring SOCE from these neurons in vivo is not possible due to the presence of abundant extracellular Ca2+ in the brain. Given their sparse number, it proved technically challenging to isolate the fpDANs in culture to perform SOCE measurements using the Ca2+ add back protocol. Due to these reasons, we have relied upon using Carbachol to elicit IP3-mediated Ca2+ release and SOCE as a proxy for in vivo SOCE. In previous studies we have shown that Carbachol treatment of cultured Drosophila neurons elicits IP3-mediated Ca2+ release and SOCE (Agrawal et al., 2010; Figure 8). Moreover, expression of OraiE180A completely blocks SOCE as measured in primary cultures of dopaminergic neurons (Pathak et al., 2015; Figure 1E). Hence we have not repeated SOCE measurements from all dopaminergic neurons in this work. In the revised version we have explicitly stated this weakness of our study and the reasons for it (See Lines 307-313; ‘Limitations of the study’-Lines 719-735).

      7) In the experiments performed to rescue flight duration in Set2RNAi individuals the authors overexpress STIM and attribute the effect to "Excess STIM presumably drives higher SOCE sufficient to rescue flight bout durations caused by deficient Set2 levels.". This should be experimentally tested as the STIM:Orai stoichiometry has been demonstrated as essential for SOCE.

      The assumption that STIM overexpression drives higher SOCE is based upon previously published work from Drosophila neurons (Agrawal et al., 2010; Chakraborty et al, 2016; Deb et al., 2016) which demonstrates that excess WT STIM overcomes IP3R deficiencies (RNAi or hypomorphic mutants) to rescue SOCE. We agree that STIM-Orai stoichiometry is essential for SOCE, and propose that the rescue backgrounds possess sufficient WT Orai, which is recruited by the excess STIM to mediate the rescue. We have referenced the earlier work to validate our use of STIMOE for rescue of SOCE (See Lines 226-233).

      Here, we propose that Set2 is part of a positive feedback loop (see Lines 286-307) driving transcription of mAChR and IP3R (Figure 3F). In keeping with this hypothesis, we posit that the phenotypes observed in the Set2RNAi background (Figure 2D) result from decreased itpr and mAChR expression (validated in Figure 3E). This is further validated by the Set2 overexpression mediated rescue of OraiE180A (Figure 2D) and rescue of itpr/mAChR expression in the OraiE180A background (Figure 3B-E; THD’>OraiE180A; Set2OE).

      8) The authors show that overexpression of OraiE108A results in Stim downregulation at a mRNA level. What about the protein level? And more important, how does OraiE108A downregulate Stim expression? Does it promote Stim degradation? Does it inhibit Stim expression?

      We hypothesize that changes in STIM mRNA observed in the THD’ > OraiE180A neurons stems from an overall reduction in IP3-mediated Ca2+ release and SOCE due to loss of Trl-Set2 driven gene expression detailed in our transcriptional feedback loop model (Figure 5C; see Lines 286-307; 581-591). We have attempted to explain this aspect more clearly in the revised version of the manuscript. While we agree that measuring levels of STIM protein would be helpful, estimation of protein levels from a limited number of neurons (~35 cells per brain) is technically challenging. The STIM antibody does not work well in immunohistochemistry. In the absence of any experimental evidence we cannot comment on how expression of OraiE180A might affect STIM protein turnover (see Lines 307-313).

      9) Lines 271-273, the authors state "whereas overexpression of a transgene encoding Set2 in THD' neurons either with loss of SOCE (OraiE180A) or with knockdown of the IP3R (itprRNAi), lead to significant rescue of the Ca2+ response". This is attributed to a positive effect of Set2 expression on IP3R expression and the authors show a positive correlation between these two parameters; however, there is no demonstration that Set2 expression can rescue IP3R expression in cells where the IP3R is knocked down (itprRNAi). This should be further demonstrated.

      The rescue of IP3R expression by Set2 overexpression in itprRNAi was demonstrated in a different set of Drosophila neurons in an earlier study (Mitra et al., 2021) and has not been repeated specifically in THD’ neurons (see Lines 286-307). Similar to the previous study, here we tested CCh stimulated Ca2+ responses of THD’ neurons with itprRNAi and itprRNAi; SetOE (Fig S3), which are indeed rescued by SET2OE see Lines 280-285)

      10) The data presented in Figure 3E should be functionally demonstrated by analyzing the ability of CCh to release Ca2+ from the intracellular stores in the absence of extracellular Ca2+.

      CCh-mediated Ca2+ release from the intracellular stores in the absence of extracellular Ca2+ has been described in primary cultures of Drosophila neurons in previously published work (Venkiteswaran and Hasan, 2009; Agrawal et al., 2010) This work focuses on a set of 16-19 dopaminergic neurons in a hemisphere of the Drosophila central brain. It is technically challenging to generate a 0 Ca2+ environment in vivo, which is essential for measuring store Ca2+ release. Given their meagre numbers, primary cultures of these neurons is not readily feasible. (see Lines 307-313; ‘Limitations of the study’-Lines 719-735)

      11) The conclusion that SOCE regulates the neuronal excitability threshold is based entirely on either partial behavioral rescue of flight, or measurements of KCl-induced Ca2+ rises monitored by GCaMP6m in DAN neurons. The threshold for neuronal excitability is a precise parameter based on rheobase measurements of action potentials in current-clamp. Measurements of slow calcium signals using a slow dye such as GCaMp6m should not be equated with neuronal excitability. What is measured is a loss of the calcium response in high K depolarization experiments, which occurs due to the loss of expression of Cav channels. Hence, the use of this term is not accurate and will confuse readers. The use of terms referring to neuronal excitability needs to be changed throughout the manuscript. As such, the conclusions regarding neuronal excitability should be strongly tempered and the data reinterpreted as there are no true measurements of neuronal excitability in the manuscript. All that can be said is that expression of certain ion channel genes is suppressed. Since both Na+ channels and K+ channel expression is down-regulated, it is hard to say precisely how membrane excitability is altered without action potential analysis.

      The claim that SOCE influences neuronal excitability is based on the following observations:

      1. Interruption of the transcriptional feedback loop involving SOCE, Trl, and Set2 through loss of any of its constituents, results in the downregulation of VGCCs (Figure 5G, 6H), which are essential components of action potentials.

      2. OraiE180A mediated loss of SOCE in THD’ neurons abrogates the KCl-evoked depolarization response (Figure 6B, C) measured using GCaMP6m. We verified that this response requires VGCC function using pharmacological inhibition of L-type VGCCs (Figure 6E, F).

      3. SOCE deficient THD’ neurons, which were presumably compromised in their ability to evoke action potentials could be rescued to undergo KCl-evoked depolarisation by expression of NachBac, which lowers the depolarization threshold (Figure 7C, D) or through optogenetic stimulation using CsChrimson (Figure 7F).

      We agree that ‘neuronal excitability threshold’ is a precise electrophysiological parameter that has not been directly investigated here by measurement of action potentials. Therefore, references to neuronal excitability have been tempered throughout the revised manuscript and be replaced with a more generic reference to ‘neuronal activity’. In this context we have included further evidence supporting reduced activity of THD’ neurons upon loss of SOCE in the revision.

      Since one of the key functional outcomes of activity during critical developmental periods such as the 72-96 hrs APF developmental window identified in this study, is remodelling of neuronal morphology, we decided to investigate the same in our context. Neuronal activity can drive changes in neurite complexity and axonal arborization (Depetris-Chauvin et al., 2011) especially during critical developmental periods (Sachse et al., 2007). To understand if Orai mediated Ca2+ entry and downstream gene expression through Set2 affects this activity-driven parameter, we investigated the morphology of fpDANs, and specifically measured the complexity of presynaptic terminals within the 2’1 lobe MB using super-resolution microscopy. We found striking changes in the neurite volume upon expression of OraiE180A which could be rescued by restoring either Set2 (OraiE180A; Set2OE) or by inducing hyperactivity through NachBac expression (OraiE180A ; NachBacOE). These data have been included in the revised manuscript (Figure 8 B, C, D; see Lines 481-482; 519-534; 584-591; 701-704).

      12) Related, since trl does not contain any molecular domains that could be regulated by Ca2+ signaling, it is unclear whether trl is directly regulated by SOCE or the regulation is highly indirect. Reporter assays evaluating trl activation upon Ca2+ rises would provide much stronger and more direct evidence for the conclusion that trl is a SOCE-regulated TF. As such the evidence is entirely based on RNAi downregulation of trl which indicates that trl is essential but has no bearing on exactly what point of the signaling cascade it is involved.

      We agree that luciferase Trl reporters would provide a direct method to test SOCE-mediated activation. Future investigations will be targeted in this direction. Regarding possible mechanisms of Trl activation - since we could not detect a well-defined Ca2+ binding domain in Trl, we hypothesize that it may be phosphorylation by a Ca2+ sensitive kinase. Phosphoproteome analysis of Trl indicates that it does indeed undergo phosphorylation at a Threonine reside (T237; Zhai et al., 2008), which may be mediated by the Ca2+ sensitive kinase-CaMKII based on binding partners identified in the Trl interactome (Lomaev et al., 2018; Past work (Ravi et al., 2018) has indeed demonstrated a requirement for CaMKII in THD’ neurons for flight. We are currently testing whether CaMKII functions downstream of SOCE in these neurons to mediate flight, and will be updating this information in the next version of the manuscript.

      New data and analysis has been included - see Figure S5; ‘Limitations of the study’- Lines 397-424; 736-739).

      13) Are NFAT levels altered in the Orai1 loss of function mutant? If not, this should be explicitly stated. It would seem based on previous literature that some gene regulation may be related to the downregulation of this established Ca2+-dependent transcription factor. Same for NFkb.

      As mentioned in the revised version of the manuscript (see Lines 315-326), Drosophila NFAT lacks a calcineurin binding site and is therefore not sensitive to Ca2+ (Keyser et al., 2007). In the past we tested if knockdown of NF-kB in dopaminergic neurons gave a flight phenotype and did not observe any measurable deficit. From the RNAseq data we find a slight downregulation of NFAT (0.49 fold, p value=0.048) and NF-kb (0.26 fold, p value =0.258) the significance of which is unclear at this point. We did not find any consensus binding sites for these two factors in the regulatory regions of downregulated genes from THD’ neurons.

      14) Does over-expression of Set2 restore ion channel expression especially those of the VGCCs? This would provide rigorous, direct evidence that SOCE-mediated regulation of VGCCs through Set2 controls voltage-gated calcium channel signaling.

      Set2 overexpression in the OraiE180A background indeed restores the expression of VGCC genes (see Figure 6H; Lines 461-468).

      15) All 6 representative panels from Figure 3B are duplicated in Figure 4G. Likewise, 2 representative panels from Figure 5H are duplicated in Figure 6D. Although these panels all represent the results from control experiments, the relevant experiments were likely not conducted at the same time and under the same conditions. Thus, control images from other experiments should not be used simply because they correspond to controls. This situation should be clarified.

      We regret the confusion caused by the same representative images for the control experiments. These have been replaced by new representative images for Figure 4G and 6D in the updated version of the manuscript.

      16) The figures are unusually busy and difficult to follow. In part this is because they usually have many panels (Fig. 1: A-I; Fig. 2, A-J, etc) but also because the arrangement of the panels is not consistent: sometimes the following panel is found to the right, other times it is below. It would help the reader to make the order of the panels consistent, and, if possible, reduce the number of panels and/or move some of the panels to new figures (eLife does not limit the number of display items).

      The image panels have been rearranged for ease of reading in the updated version of the manuscript.

      17) As a final recommendation, the reviewers suggest that the authors a- Reword the text that refers to membrane excitability since membrane excitability was not directly measured here. b-Explain why STIM1 rescues the partial loss of flight in Set2 RNAi flies (Fig. S2E); and c- Explain how/why trl is calcium regulated and test using luciferase (or other) reporter assays whether Orai activation leads to trl activation.

      a. Textual references to membrane excitability have been appropriately modified and some new data has been included in this regard (see Figure 8 B, C, D; Lines 481-483; 519-534; 584-591; 701-704).

      b. We have provided a detailed explanation for how STIM overexpression might rescue the phenotypes caused by Set2RNAi in Point 1 (see Lines 226-233). In short, these phenotypes depend upon IP3R mediated Ca2+ entry driving a transcriptional feedback loop. We relied upon past reports that STIM overexpression upregulates IP3R-mediated Ca2+ release and SOCE in Drosophila itpr mutant neurons (Agrawal et al., 2010; Chakraborty et al, 2016; Deb et al, 2016). We therefore propose that STIM overexpression in the Set2RNAi background rescues IP3R mediated Ca2+ release followed by SOCE, which drives enhanced Set2 transcription, counteracting the effects of the RNAi. We will explain this more clearly with past references in the next revision.

      c. We have provided a detailed response to this comment in Point 12. Briefly, we agree that building luciferase reporters for Trl could be an ideal strategy to test for its responsiveness to SOCE and needs to be done in future. As an alternate strategy, we have looked at data from existing studies of interacting partners of Trl (Lomaev et al., 2017) and identified CamKII, which is both Ca2+ responsive (Braun and Schulman, 1995; Yasuda et al., 2022), and thus might activate Trl through a phosphorylation-switch like mechanism (see Figure S5; ‘Limitations of the study’-736-739; Lines 397-424). Moreover, a previous publication identified a requirement for CamKII in THD’ neurons for Drosophila flight (Ravi et al., 2018). We have tested the ability of a dominant active version of CamKII to rescue THD’>E180A flight deficits and have included this information in the next version of the manuscript.

      References

      1. Agrawal N, Venkiteswaran G, Sadaf S, Padmanabhan N, Banerjee S, Hasan G. Inositol 1,4,5-Trisphosphate Receptor and dSTIM Function in Drosophila Insulin-Producing Neurons Regulates Systemic Intracellular Calcium Homeostasis and Flight. J Neurosci. 2010;30:1301-1313. doi:10.1523/jneurosci.3668-09.2010

      2. Braun AP, Schulman H. A non-selective cation current activated via the multifunctional Ca(2+)-calmodulin-dependent protein kinase in human epithelial cells. J Physiol. 1995. 488:37-55. doi:10.1113/jphysiol.1995.sp020944

      3. Chakraborty S, Deb BK, Chorna T, Konieczny V, Taylor CW, Hasan G. Mutant IP3 receptors attenuate store-operated Ca2+ entry by destabilizing STIM-Orai interactions in Drosophila neurons. J Cell Sci. 2016. 129:3903-3910. doi:10.1242/jcs.191585

      4. Deb BK, Pathak T, Hasan G. Store-independent modulation of Ca2+ entry through Orai by Septin 7. Nat Commun. 2016. 7:11751. doi:10.1038/ncomms11751

      5. Depetris-Chauvin A, Berni J, Aranovich EJ, Muraro NI, Beckwith EJ, Ceriani MF. Adult-specific electrical silencing of pacemaker neurons uncouples molecular clock from circadian outputs. Curr Biol. 2011. 21:1783-1793. doi: 10.1016/j.cub.2011.09.027.

      6. Keyser P, Borge-Renberg K, Hultmark D. The Drosophila NFAT homolog is involved in salt stress tolerance. Insect Biochem Mol Biol. 2007. 37:356-362. doi:10.1016/j.ibmb.2006.12.009

      7. Kilo L, Stürner T, Tavosanis G, Ziegler AB. Drosophila Dendritic Arborisation Neurons: Fantastic Actin Dynamics and Where to Find Them. Cells. 2021. 10:2777. doi:10.3390/cells10102777

      8. Lomaev D, Mikhailova A, Erokhin M, et al. The GAGA factor regulatory network: Identification of GAGA factor associated proteins. PLoS One. 2017. 12:e0173602. doi:10.1371/journal.pone.0173602

      9. Mitra R, Richhariya S, Jayakumar S, Notani D, Hasan G. IP3/Ca2+ signals regulate larval to pupal transition under nutrient stress through the H3K36 methyltransferase dSET2. Development. 2021. 148:dev199018. doi:10.1101/2020.11.25.399329

      10. Pathak T, Agrawal T, Richhariya S, Sadaf S, Hasan G. Store-Operated Calcium Entry through Orai Is Required for Transcriptional Maturation of the Flight Circuit in Drosophila. J Neurosci. 2015. 35:13784-13799. doi:10.1523/jneurosci.1680-15.2015

      11. Ravi P, Trivedi D, Hasan G. FMRFa receptor stimulated Ca2+ signals alter the activity of flight modulating central dopaminergic neurons in Drosophila melanogaster. Barsh GS, ed. PLOS Genet. 2018. 14:e1007459. doi:10.1371/journal.pgen.1007459

      12. Sachse S, Rueckert E, Keller A, Okada R, Tanaka NK, Ito K, Vosshall LB. Activity-dependent plasticity in an olfactory circuit. Neuron. 2007. 56:838-50. doi: 10.1016/j.neuron.2007.10.035.

      13. Sharma A, Hasan G. Modulation of flight and feeding behaviours requires presynaptic IP3Rs in dopaminergic neurons. Elife. 2020;9. e62297.doi:10.7554/elife.62297

      14. Venkiteswaran G, Hasan G. Intracellular Ca2+ signalling and store operated Ca2+ entry are required in Drosophila neurons for flight. Proc Natl Acad Sci. 2009.106:10326-10331. doi: 10.1073/pnas.0902982106

      15. Yasuda R, Hayashi Y, Hell JW. CaMKII: a central molecular organizer of synaptic plasticity, learning and memory. Nat Rev Neurosci. 2022. 23: 666-682 doi:10.1038/s41583-022-00624-2

      16. Zhai B, Villén J, Beausoleil SA, Mintseris J, Gygi SP. Phosphoproteome Analysis of Drosophila melanogaster Embryos. J Proteome Res. 2008. 7:1675-1682. doi:10.1021/pr700696a

    1. Author Response:

      Reviewer #1 (Public Review):

      Summary:<br /> The global decline of amphibians is primarily attributed to deadly disease outbreaks caused by the chytrid fungus, Batrachochytrium dendrobatidis (Bd). It is unclear whether and how skin-resident immune cells defend against Bd. Although it is well known that mammalian mast cells are crucial immune sentinels in the skin and play a pivotal role in the immune recognition of pathogens and orchestrating subsequent immune responses, the roles of amphibian mast cells during Bd infections are largely unknown. The current study developed a novel way to enrich X. laevis skin mast cells by injecting the skin with recombinant stem cell factor (SCF), a KIT ligand required for mast cell differentiation and survival. The investigators found an enrichment of skin mast cells provides X. laevis substantial protection against Bd and mitigates the inflammation-related skin damage resulting from Bd infection. Additionally, the augmentation of mast cells leads to increased mucin content within cutaneous mucus glands and shields frogs from the alterations to their skin microbiomes caused by Bd.

      Strengths:<br /> This study underscores the significance of amphibian skin-resident immune cells in defenses against Bd and introduces a novel approach to examining interactions between amphibian hosts and fungal pathogens.

      Weaknesses:<br /> The main weakness of the study is the lack of functional analysis of X. laevis mast cells. Upon activation, mast cells have the characteristic feature of degranulation to release histamine, serotonin, proteases, cytokines, and chemokines, etc. The study should determine whether X. laevis mast cells can be degranulated by two commonly used mast cell activators IgE and compound 48/80 for IgE-dependent and independent pathways. This can be easily done in vitro. It is also important to assess whether in vivo these mast cells are degranulated upon Bd infection using avidin staining to visualize vesicle releases from mast cells. Figure 3 only showed rSCF injection caused an increase in mast cells in naïve skin. They need to present whether Bd infection can induce mast cell increase and rSCF injection under Bd infection causes a mast cell increase in the skin. In addition, it is unclear how the enrichment of mast cells provides protection against Bd infection and alternations to skin microbiomes after infection. It is important to determine whether skin mast cells release any contents mentioned above.

      We would like to thank the reviewer for taking the time to review our work and for providing us with valuable feedback.

      Please note that amphibians do not possess the IgE antibody isotype1.

      To our knowledge there have been no published studies using approaches for studying mammalian mast cell degranulation to examine amphibian mast cells. Notably, several studies suggest that amphibian mast cells lack histamine2, 3, 4, 5 and serotonin2, 6. While there are commercially available kits and reagents for examining mammalian mast cell granule content, most of these reagents may not cross-react with their amphibian counterparts. This is especially true of cytokines and chemokines, which diverged quickly with evolution and thus do not share substantial protein sequence identity across species as divergent as frogs and mammals. Respectfully, while following up on these findings is possible, it would involve considerable additional work to find reagents that would detect amphibian mast cell contents.

      We would also like to respectfully point out that while mast cell degranulation is a feature most associated with mammalian mast cells, this is not the only means by which mammalian mast cells confer their immunological effects. While we agree that defining the biology of amphibian mast cell degranulation is important, we anticipate that since the anti-Bd protection conferred by enriching frog mast cells is seen after 21 days of enrichment, it is quite possible that degranulation may not be the central mechanism by which the mast cells are mediating this protection.

      As noted in our manuscript, frog mast cells upregulate their expression of interleukin-4 (IL4), which is a hallmark cytokine associated with mammalian mast cells7. We are presently exploring the role of the frog IL4 in the observed mast cell anti-Bd protection. Should we generate meaningful findings in this regard, we will add them to the revised version of this manuscript.

      We are also exploring the heparin content of frog mast cells and capacities of these cells to degranulate in vitro in response to compound 48/80. In addition, we are exploring in vivo mast cell degranulation via histology and avidin-staining. Should these studies generate significant findings, we will include them in the revised version of this manuscript.

      Per the reviewer’s suggestion, in our revised manuscript we also plan to include data showing whether Bd infections affect skin mast cell numbers and how rSCF injection impacts skin mast cell numbers in the context of Bd infections.

      In regard to how mast cells impact Bd infections and skin microbiomes, our data indicate that mast cells are augmenting skin integrity during Bd infections and promoting mucus production, as indicated by the findings presented in Figure 4A-C and Figure 5A-C, respectively. There are several mammalian mast cell products that elicit mucus production. In mammals, this mucus production is mediated by goblet cells while the molecular control of amphibian skin mucus gland content remains incompletely understood. Interleukin-13 (IL13) is the major cytokine associated with mammalian mucus production8, while to our knowledge this cytokine is either not encoded by amphibians or else has yet to be identified and annotated in these animals’ genomes. IL4 signaling also results in mucus production9 and we are presently exploring the possible contribution of the X. laevis IL4 to skin mucus gland filling. Any significant findings on this front will be included in the revised manuscript. Histamine release contributes to mast cell-mediated mucus production10, but as we outline above, several studies indicate that amphibian mast cells may lack histamine2, 3, 4, 5. Mammalian mast cell-produced lipid mediators also play a critical role in eliciting mucus secretion11 and our transcriptomic analysis indicates that frog mast cells express several enzymes associated with production of such mediators. We will highlight this observation in our revised manuscript.

      We anticipate that X. laevis mast cells influence skin integrity, microbial composition and Bd susceptibility in a myriad of ways. Considering the substantial differences between amphibian and mammalian evolutionary histories and physiologies, we anticipate that many of the mechanisms by which X. laevis mast cells confer anti-Bd protection will prove to be specific to amphibians and some even unique to X. laevis. We are most interested in deciphering what these mechanisms are but foresee that they will not necessarily reflect what one would expect based on what we know about mammalian mast cells in the context of mammalian physiologies.

      Reviewer #2 (Public Review):

      Summary:<br /> In this study, Hauser et al investigate the role of amphibian (Xenopus laevis) mast cells in cutaneous immune responses to the ecologically important pathogen Batrachochytrium dendrobatidis (Bd) using novel methods of in vitro differentiation of bone marrow-derived mast cells and in vivo expansion of skin mast cell populations. They find that bone marrow-derived myeloid precursors cultured in the presence of recombinant X. laevis Stem Cell Factor (rSCF) differentiate into cells that display hallmark characteristics of mast cells. They inject their novel (r)SCF reagent into the skin of X. laevis and find that this stimulates the expansion of cutaneous mast cell populations in vivo. They then apply this model of cutaneous mast cell expansion in the setting of Bd infection and find that mast cell expansion attenuates the skin burden of Bd zoospores and pathologic features including epithelial thickness and improves protective mucus production and transcriptional markers of barrier function. Utilizing their prior expertise with expanding neutrophil populations in X. laevis, the authors compare mast cell expansion using (r)SCF to neutrophil expansion using recombinant colony-stimulating factor 3 (rCSF3) and find that neutrophil expansion in Bd infection leads to greater burden of zoospores and worse skin pathology.

      Strengths: <br /> The authors report a novel method of expanding amphibian mast cells utilizing their custom-made rSCF reagent. They rigorously characterize expanded mast cells in vitro and in vivo using histologic, morphologic, transcriptional, and functional assays. This establishes solid footing with which to then study the role of rSCF-stimulated mast cell expansion in the Bd infection model. This appears to be the first demonstration of the exogenous use of rSCF in amphibians to expand mast cell populations and may set a foundation for future mechanistic studies of mast cells in the X. laevis model organism. 

      We thank the reviewer for recognizing the breadth and extent of the undertaking that culminated in this manuscript. Indeed, this manuscript would not have been possible without considerable reagent development and adaptation of techniques that had previously not been used for amphibian immunity research. In line with the reviewer’s sentiment, to our knowledge this is the first report of using molecular approaches to augment amphibian mast cells, which we hope will pave the way for new areas of research within the fields of comparative immunology and amphibian disease biology.

      Weaknesses:<br /> The conclusions regarding the role of mast cell expansion in controlling Bd infection would be stronger with a more rigorous evaluation of the model, as there are some key gaps and remaining questions regarding the data. For example:

      1. Granulocyte expansion is carefully quantified in the initial time courses of rSCF and rCSF3 injections, but similar quantification is not provided in the disease models (Figures 3E, 4G, 5D-G). A key implication of the opposing effects of mast cell vs neutrophil expansion is that mast cells may suppress neutrophil recruitment or function. Alternatively, mast cells also express notable levels of csfr3 (Figure 2) and previous work from this group (Hauser et al, Facets 2020) showed rG-CSF-stimulated peritoneal granulocytes express mast cell markers including kit and tpsab1, raising the question of what effect rCSF3 might have on mast cell populations in the skin. Considering these points, it would be helpful if both mast cells and neutrophils were quantified histologically (based on Figure 1, they can be readily distinguished by SE or Giemsa stain) in the Bd infection models.

      We thank the reviewer for this insightful suggestion. We are performing a further examination of skin granulocyte content during Bd infections and plan on including any significant findings in our revised manuscript.

      We predict that rSCF administration results in the accumulation of mast cells that are polarized such that they ablate the inflammatory response elicited by Bd infection. Mammalian mast cells, including peritonea-resident mast cells, express csf3r12, 13. Although the X. laevis animal model does not permit nearly the degree of immune cell resolution afforded by mammalian animal models, we do know that the adult X. laevis peritonea contain heterogenous leukocyte populations. We anticipate that the high kit expression reported by Hauser et al., 2020 in the rCSF3-recruited peritoneal leukocytes reflects the presence of mast cells therein. As such and in acknowledgement of the reviewer’s suggestion, we also think that the cells recruited by rCSF3 into the skin may include not only neutrophils but also mast cells. Possibly, these mast cells have distinct polarization states from those enriched by rSCF. While the lack of antibodies against frog neutrophils or mast cells has limited our capacity to address this question, we will attempt to reexamine by histology the proportions of skin neutrophils and mast cells in the skins of frogs under the conditions described in our manuscript. Any new findings in this regard will be included in the revised version of this work.

      2. Epithelial thickness and inflammation in Bd infection are reported to be reduced by rSCF treatment (Figure 3E, 5A-B) or increased by rCSF3 treatment (Figure 4G) but quantification of these critical readouts is not shown.

      We thank the reviewer for this suggestion. We will score epithelial thickness under the distinct conditions described in our manuscript and present the quantified data in the revised paper.

      3. Critical time points in the Bd model are incompletely characterized. Mast cell expansion decreases zoospore burden at 21 dpi, while there is no difference at 7 dpi (Figure 3E). Conversely, neutrophil expansion increases zoospore burden at 7 dpi, but no corresponding 21 dpi data is shown for comparison (Figure 4G). Microbiota analysis is performed at a third time point,10 dpi (Figure 5D-G), making it difficult to compare with the data from the 7 dpi and 21 dpi time points. Reporting consistent readouts at these three time points is important to draw solid conclusions about the relationship of mast cell expansion to Bd infection and shifts in microbiota.

      Because there were no significant effects of mast cell enrichment at 7 days post Bd infection, we chose to look at the microbiome composition in a subsequent experiment at 10 days and 21 days post Bd infection, with 10 days being a bit more of a midway point between the initial exposure and day 21, when we see the effect on Bd loads. We will clarify this rationale in the revised manuscript.

      The enrichment of neutrophils in frog skins resulted in prompt (12 hours post enrichment) skin thickening (in absence of Bd infection) and increased frog Bd susceptibility by 7 days of infection. Conversely, mast cell enrichment stabilized skin mucosal and symbiotic microbial environment, presumably accounting at least in part for the lack of further Bd growth on mast cell-enriched animals by 21 days of infection. Our question regarding the roles of inflammatory granulocytes/neutrophils during Bd infections was that of ‘how’ rather ‘when’ these cells affect Bd infections. Because the central focus of this work was mast cells and not other granulocyte subsets, when we saw that rCSF3-recruited granulocytes adversely affected Bd infections at 7 days post infection, we did not pursue the kinetics of these responses further. We plan to explore the roles of inflammatory mediators and disparate frog immune cell subsets during the course of Bd infections, but we feel that these future studies are more peripheral to the central thesis of the present manuscript regarding the roles of frog mast cells during Bd infections.

      4. Although the effect of rSCF treatment on Bd zoospores is significant at 21 dpi (Figure 3E), bacterial microbiota changes at 21 dpi are not (Figure S3B-C). This discrepancy, how it relates to the bacterial microbiota changes at 10 dpi, and why 7, 10, and 21 dpi time points were chosen for these different readouts (Figure 5F-G), is not discussed.

      Our results indicate that after 10 days of Bd infection, control Bd-challenged animals exhibited reduced microbial richness, while skin mast cell-enriched Bd-infected frogs were protected from this disruption of their microbiome. The amphibian microbiome serves as a major barrier to these fungal infections14, and we anticipate that Bd-mediated disruption of microbial richness and composition facilitates host skin colonization by this pathogen. Control and mast cell-enriched animals had similar skin Bd loads at 10 days post infection. However, by 21 days of Bd infection the mast cells-enriched animals maintained their Bd loads to levels observed at 10 days post infection, whereas the control animals had significantly greater Bd loads. Thus, we anticipate that frog mast cells are conferring the observed anti-Bd protection in part by preventing microbial disassembly and thus interfering with optimal Bd colonization and growth on frog skins. In other words, maintained microbial composition at 10 days of infection may be preventing additional Bd colonization/growth, as seen when comparing skins of control and mast cell-enriched frogs at 21 days post infection. By 21 days of infection, control animals rebounded from the Bd-mediated reduction in bacterial richness seen at 10 days. Considering that after 21 days of infection control animals also had significantly greater Bd loads than mast-cell enriched animals suggests that there may be a critical earlier window during which microbial composition is able to counteract _Bd_growth. 

      While the current draft of our manuscript has a paragraph to this effect (see below), we appreciate the reviewer conveying to us that our perspective on the relationship between skin mast cells and the kinetics of microbial composition and _Bd_loads could be better emphasized. We plan to revise our manuscript to include the above discussion points. 

      Bd infections caused major reductions in bacterial taxa richness, changes in composition and substantial increases in the relative abundance of Bd-inhibitory bacteria early in the infection. Similar changes to microbiome structure occur during experimental Bd infections of red-backed salamanders and mountain yellow-legged frogs15, 16. In turn, progressing Bd_infections corresponded with a return to baseline levels of _Bd-inhibitory bacteria abundance and rebounding microbial richness, albeit with dissimilar communities to those seen in control animals. These temporal changes indicate that amphibian microbiomes are dynamic, as are the effects of Bd infections on them. Indeed, Bd infections may have long-lasting impacts on amphibian microbiomes15. While Bd infections manifested in these considerable changes to frog skin microbiome structure, mast cell enrichment appeared to counteract these deleterious effects to their microbial composition. Presumably, the greater skin mucosal integrity and mucus production observed after mast cell enrichment served to stabilize the cutaneous environment during Bd infections, thereby ameliorating the Bd-mediated microbiome changes. While this work explored the changes in established antifungal flora, we anticipate the mast cell-mediated inhibition of Bd may be due to additional, yet unidentified bacterial or fungal taxa. Intriguingly, while mammalian skin mast cell functionality depends on microbiome elicited SCF production by keratinocytes17, our results indicate that frog skin mast cells in turn impact skin microbiome structure and likely their function. It will be interesting to further explore the interdependent nature of amphibian skin microbiomes and resident mast cells.

      5. The time course of rSCF or rCSF3 treatments relative to Bd infection in the experiments is not clear. Were the treatments given 12 hours prior to the final analysis point to maximize the effect? For example, in Figure 3E, were rSCF injections given at 6.5 dpi and 20.5 dpi? Or were treatments administered on day 0 of the infection model? If the latter, how do the authors explain the effects at 7 dpi or 21 dpi given mast cell and neutrophil numbers return to baseline within 24 hours after rSCF or rCSF3 treatment, respectively?

      Please find the schematic of the immune manipulation, Bd infection, and sample collection times below. We will include a figure like this in our revised manuscript.

      The title of the manuscript may be mildly overstated. Although Bd infection can indeed be deadly, mortality was not a readout in this study, and it is not clear from the data reported that expanding skin mast cells would ultimately prevent progression to death in Bd infections.

      We acknowledge this point. The revised manuscript will be titled: “Amphibian mast cells: barriers to chytrid fungus infections”.

      Reviewer #3 (Public Review):

      Summary:<br /> Hauser et al. provide an exceptional study describing the role of resident mast cells in amphibian epidermis that produce anti-inflammatory cytokines that prevent Batrachochytrium dendrobatidis (Bd) infection from causing harmful inflammation, and also protect frogs from changes in skin microbiomes and loss of mucin in glands and loss of mucus integrity that otherwise cause changes to their skin microbiomes. Neutrophils, in contrast, were not protective against Bd infection. Beyond the beautiful cytology and transcriptional profiling, the authors utilized elegant cell enrichment experiments to enrich mast cells by recombinant stem cell factor, or to enrich neutrophils by recombinant colony-stimulating factor-3, and examined respective infection outcomes in Xenopus.

      Strengths:<br /> Through the use of recombinant IL4, the authors were able to test and eliminate the hypothesis that mast cell production of IL4 was the mechanism of host protection from Bd infection. Instead, impacts on the mucus glands and interaction with the skin microbiome are implicated as the protective mechanism. These results will press disease ecologists to examine the relative importance of this immune defense among species, the influence of mast cells on the skin microbiome and mucosal function, and open the potential for modulating mucosal defense.

      We thank the reviewer for recognizing the significance and utility of the findings presented in our manuscript.

      Weaknesses:<br /> A reduction of bacterial diversity upon infection, as described at the end of the results section, may not always be an "adverse effect," particularly given that anti-Bd function of the microbiome increased. Some authors (see Letourneau et al. 2022 ISME, or Woodhams et al. 2023 DCI) consider these short-term alterations as encoding ecological memory, such that continued exposure to a pathogen would encounter an enriched microbial defense. Regardless, mast cell-initiated protection of the mucus layer may negate the need for this microbial memory defense.

      We thank the reviewer their insightful comment. We will revise our discussion to include this possible interpretation.

      While the description of the mast cell location in the epidermal skin layer in amphibians is novel, it is not known how representative these results are across species ranging in chytridiomycosis susceptibility. No management applications are provided such as methods to increase this defense without the use of recombinant stem cell factor, and more discussion is needed on how the mast cell component (abundance, distribution in the skin) of the epidermis develops or is regulated.

      We appreciate the reviewer’s comment and would like to point out that the work presented in our manuscript was driven by comparative immunology questions more than by conservation biology.

      We thank the reviewer for suggesting expanding our discussion to include potential management applications and potential mechanisms for regulating frog skin mast cells. While any content to these effects would be highly speculative, we agree that it may spark new interest and pave new avenues for research. To this end, our revised manuscript will include a paragraph to this effect.

      References:

      1. Flajnik, M.F. A cold-blooded view of adaptive immunity. Nat Rev Immunol 18, 438-453 (2018).

      2. Mulero, I., Sepulcre, M.P., Meseguer, J., Garcia-Ayala, A. & Mulero, V. Histamine is stored in mast cells of most evolutionarily advanced fish and regulates the fish inflammatory response. Proc Natl Acad Sci U S A 104, 19434-19439 (2007).

      3. Reite, O.B. A phylogenetical approach to the functional significance of tissue mast cell histamine. Nature 206, 1334-1336 (1965).

      4. Reite, O.B. Comparative physiology of histamine. Physiol Rev 52, 778-819 (1972).

      5. Takaya, K., Fujita, T. & Endo, K. Mast cells free of histamine in Rana catasbiana. Nature 215, 776-777 (1967).

      6. Galli, S.J. New insights into "the riddle of the mast cells": microenvironmental regulation of mast cell development and phenotypic heterogeneity. Lab Invest 62, 5-33 (1990).

      7. Babina, M., Guhl, S., Artuc, M. & Zuberbier, T. IL-4 and human skin mast cells revisited: reinforcement of a pro-allergic phenotype upon prolonged exposure. Archives of dermatological research 308, 665-670 (2016).

      8. Lai, H. & Rogers, D.F. New pharmacotherapy for airway mucus hypersecretion in asthma and COPD: targeting intracellular signaling pathways. J Aerosol Med Pulm Drug Deliv 23, 219-231 (2010).

      9. Rankin, J.A. et al. Phenotypic and physiologic characterization of transgenic mice expressing interleukin 4 in the lung: lymphocytic and eosinophilic inflammation without airway hyperreactivity. Proc Natl Acad Sci U S A 93, 7821-7825 (1996).

      10. Church, M.K. Allergy, Histamine and Antihistamines. Handb Exp Pharmacol 241, 321-331 (2017).

      11. Nakamura, T. The roles of lipid mediators in type I hypersensitivity. J Pharmacol Sci 147, 126-131 (2021).

      12. Aponte-Lopez, A., Enciso, J., Munoz-Cruz, S. & Fuentes-Panana, E.M. An In Vitro Model of Mast Cell Recruitment and Activation by Breast Cancer Cells Supports Anti-Tumoral Responses. Int J Mol Sci 21 (2020).

      13. Jamur, M.C. et al. Mast cell repopulation of the peritoneal cavity: contribution of mast cell progenitors versus bone marrow derived committed mast cell precursors. BMC Immunol 11, 32 (2010).

      14. Walke, J.B. & Belden, L.K. Harnessing the Microbiome to Prevent Fungal Infections: Lessons from Amphibians. PLoS Pathog 12, e1005796 (2016).

      15. Jani, A.J. et al. The amphibian microbiome exhibits poor resilience following pathogen-induced disturbance. ISME J 15, 1628-1640 (2021).

      16. Muletz-Wolz, C.R., Fleischer, R.C. & Lips, K.R. Fungal disease and temperature alter skin microbiome structure in an experimental salamander system. Mol Ecol 28, 2917-2931 (2019).

      17. Wang, Z. et al. Skin microbiome promotes mast cell maturation by triggering stem cell factor production in keratinocytes. J Allergy Clin Immunol 139, 1205-1216 e1206 (2017).

    1. Author response:

      The following is the authors’ response to the previous reviews

      eLife Assessment

      For decades it has been accepted that only the growth-arrested "stumpy" form of Trypanosoma brucei can infect the arthropod vector, the Tsetse fly, but this was recently challenged by a demonstration that - under artificial conditions that are known to enhance infectivity - the proliferative "slender" form can also establish Tsetse infections. The infectiousness of the two forms is a fundamental question in trypanosome biology and epidemiology, concerning both infection dynamics and parasite differentiation. The authors of the current study provide compelling evidence that without artificial enhancement, the "stumpy" form is indeed much more infective for Tsetse than the slender form; they suggest that this is probably also true in the wild.

      Since the authors of this paper did not themselves test the effect of enhancing conditions, the precise reason for the discrepancy in results between the two laboratories has not been demonstrated conclusively.

      This specific comment was addressed in the revision and illustrated with new data.

      Differences between the strain clones, the cell culture conditions and/or the fly colony maintenance conditions could explain part of the differences in infection rates observed here as compared to the Schuster et al. study (1). However, the use of the lectin-inhibitory sugar N-acetyl-glucosamine to enhance infection rates in the latter study could be a more likely explanation. To assess this hypothesis, an additional experimental challenge was performed to compare infection rates in teneral versus adult flies, with or without N-acetyl-glucosamine supplement in an infective meal containing 10<sup>5</sup> slender parasites / ml (Figure 2). Whereas no infection was detected in adult flies, the N-acetyl-glucosamine supplementation of the infective meal led to an increase of the infection rates from 2,4% to 13,3% in teneral flies (Figure 2).

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Ngoune et al. present compelling evidence that Slender cells are challenged to infect tsetse flies. They explore the experimental context of a recent important paper in the field, Schuster et al., that presents evidence suggesting the proliferative Slender bloodstream T.brucei can infect juvenile tsetse flies. Schuster et al. was disruptive to the widely accepted paradigm that the Stumpy bloodstream form is solely responsible for tsetse infection and T.brucei transmission potential. Evidence presented here shows that in all cases, Stumpy form parasites are exponentially more capable of infecting tsetse flies. They further show that Slender cells do not infect mature flies.

      However, they raise questions of immature tsetse immunological potential and field transmission potential that their experiments do not address. Specifically, they do not show that teneral tsetse flies are immunocompromised, that tsetse flies must be immunocompromised for Slender infection nor that younger teneral tsetse infection is not pertinent to field transmission.

      All these specific comments were addressed in the revision and illustrated with new data and references.

      - The limited immunocompetence of teneral flies has been extensively studied by the labs of S. Aksoy at Yale and M. Lehane at Liverpool. In the discussion, we provide key references from these two labs 19-22.

      - Differences between the strain clones, the cell culture conditions and/or the fly colony maintenance conditions could explain part of the differences in infection rates observed here as compared to the Schuster et al. study (1). However, the use of the lectin-inhibitory sugar N-acetyl-glucosamine to enhance infection rates in the latter study could be a more likely explanation. To assess this hypothesis, an additional experimental challenge was performed to compare infection rates in teneral versus adult flies, with or without N-acetyl-glucosamine supplement in an infective meal containing 10<sup>5</sup> slender parasites / ml (Figure 2). Whereas no infection was detected in adult flies, the N-acetyl-glucosamine supplementation of the infective meal led to an increase of the infection rates from 2,4% to 13,3% in teneral flies (Figure 2).

      - Our comment on the relevance to field transmission is simply based on field observations of the fly biology. For example, according to the capture-recapture experiments described in HARGROVE JW insect sci applic 1990 (new ref 23), wild female mortality was reported 6.8% shortly after emergence, <1% for ages 20-50 days and rose to 5% by 130 day (a pattern similar to that for laboratory reared tsetse), while wild male daily mortality was 8.3% after emergence, fell to 5.5% by 9 days, then rose continuously to more than 10% by 30 days. This means that adult flies represent the majority of individuals in a wild tsetse population. Hence, knowing that both males and females are strictly hematophagous and that they can live up to nine months, the impact of teneral flies (up to 4 days after emergence) on trypanosome transmission appears limited, if not incidental.

      Strengths:

      Experimental Design is precise and elegant, outcomes are convincing. Discussion is compelling and important to the field. This is a timely piece that adds important data to a critical discussion of host:parasite interactions, of relevance to all parasite transmission.

      Thank you

      Weaknesses:

      As above, the authors dispute the biological relevance of teneral tsetse infection in the wild, without offering evidence to the contrary. Statements need to be softened for claims regarding immunological competence or relevance to field transmission.

      All these specific comments were addressed in the revision and illustrated with new data and references.

      - The limited immunocompetence of teneral flies has been extensively studied by the labs of S. Aksoy at Yale and M. Lehane at Liverpool. In the discussion, we provide key references from these two labs 19-22.

      - Differences between the strain clones, the cell culture conditions and/or the fly colony maintenance conditions could explain part of the differences in infection rates observed here as compared to the Schuster et al. study (1). However, the use of the lectin-inhibitory sugar N-acetyl-glucosamine to enhance infection rates in the latter study could be a more likely explanation. To assess this hypothesis, an additional experimental challenge was performed to compare infection rates in teneral versus adult flies, with or without N-acetyl-glucosamine supplement in an infective meal containing 10<sup>5</sup> slender parasites / ml (Figure 2). Whereas no infection was detected in adult flies, the N-acetyl-glucosamine supplementation of the infective meal led to an increase of the infection rates from 2,4% to 13,3% in teneral flies (Figure 2).

      - Our comment on the relevance to field transmission is simply based on field observations of the fly biology. For example, according to the capture-recapture experiments described in HARGROVE JW insect sci applic 1990 (new ref 23), wild female mortality was reported 6.8% shortly after emergence, <1% for ages 20-50 days and rose to 5% by 130 day (a pattern similar to that for laboratory reared tsetse), while wild male daily mortality was 8.3% after emergence, fell to 5.5% by 9 days, then rose continuously to more than 10% by 30 days. This means that adult flies represent the majority of individuals in a wild tsetse population. Hence, knowing that both males and females are strictly hematophagous and that they can live up to nine months, the impact of teneral flies (up to 4 days after emergence) on trypanosome transmission appears limited, if not incidental.

      Reviewer #2 (Public Review):

      Summary:

      In contrast to the recent findings reported by Schuster S et al., this brief paper presents evidence suggesting that the stumpy form of T. brucei is likely the most pre-adapted form to progress through the life cycle of this parasite in the tsetse vector.

      Strengths:

      One significant experimental point is that all fly infection experiments are conducted in the absence of "boosting" metabolites like GlcNAc or S-glutathione. As a result, flies infected with slender trypanosomes present very low or nonexistent infection rates. This provides important experimental evidence that the findings of Schuster S and colleagues may need to be revisited.

      Thank you

      Weaknesses:

      However, I believe the authors should have included their own set of experiments demonstrating that the presence of these metabolites in the infectious bloodmeal enhances infection rates in flies receiving blood meals containing slender trypanosomes. Considering the well-known physiological variabilities among flies from different facilities, including infection rates, this would have strengthened the experimental evidence presented by the authors.

      This specific comment was addressed in the revision and illustrated with new data.

      Differences between the strain clones, the cell culture conditions and/or the fly colony maintenance conditions could explain part of the differences in infection rates observed here as compared to the Schuster et al. study (1). However, the use of the lectin-inhibitory sugar N-acetyl-glucosamine to enhance infection rates in the latter study could be a more likely explanation. To assess this hypothesis, an additional experimental challenge was performed to compare infection rates in teneral versus adult flies, with or without N-acetyl-glucosamine supplement in an infective meal containing 10<sup>5</sup> slender parasites / ml (Figure 2). Whereas no infection was detected in adult flies, the N-acetyl-glucosamine supplementation of the infective meal led to an increase of the infection rates from 2,4% to 13,3% in teneral flies (Figure 2).

      Reviewer #3 (Public Review):

      The dogma in the Trypanosome field is that transmission by Tsetse flies is ensured by stumpy forms. This has been recently challenged by the Engstler lab (Schuster et al.), who showed that slender forms can also be transmitted by teneral flies. In this work, the authors aimed to test whether transmission by slender forms is possible and frequent. The authors observed that most stumpy forms infections with teneral and adult flies were successful while only 1 out of 24 slender form infections were successful.

      In this revised version of the manuscript, the authors made some text changes and included statistical testing as a new section of the Materials and Methods. It seems the comparison of midgut infection in adult vs teneral flies was significant in most of the conditions. However, the critical comparison is still missing: within each type of fly (adult or teneral), was the MG infection significantly different between slender and stumpy forms?

      An ANOVA statistical analysis was performed and a dedicated section added to the revised version. MG infection rate comparisons were statistically significant between teneral and adult flies infected with ST in each amount (p<0.02 with 10 parasites; p<0.0001 with 100 and 1,000 parasites) and with 1,000 SL (p<0.0001). MG infection rate comparisons were statistically significant (p<0.0001) between parasite stages (SL and ST) in each amount (10, 100 and 1,000) and for each fly group (teneral and adult), excepted in teneral flies infected with 1,000 parasites (p=0.2356).

      Given no additional experiments were performed, it remains unknown why this work and Schuster et al. reached different conclusions. As a result it remains unclear in which conditions slender forms could be important for transmission. Several variables could explain differences between the two groups: the strain used, the presence or absence of N-acetylglucosamine and/or glutathione, how Tsetse colonies were maintained, thorough molecular and cellular characterisation of slender and stumpy forms (to avoid using intermediate forms as slender forms), comparison to recent field parasite strains.

      This specific comment was addressed in the revision and illustrated with new data.

      Differences between the strain clones, the cell culture conditions and/or the fly colony maintenance conditions could explain part of the differences in infection rates observed here as compared to the Schuster et al. study (1). However, the use of the lectin-inhibitory sugar N-acetyl-glucosamine to enhance infection rates in the latter study could be a more likely explanation. To assess this hypothesis, an additional experimental challenge was performed to compare infection rates in teneral versus adult flies, with or without N-acetyl-glucosamine supplement in an infective meal containing 10<sup>5</sup> slender parasites / ml (Figure 2). Whereas no infection was detected in adult flies, the N-acetyl-glucosamine supplementation of the infective meal led to an increase of the infection rates from 2,4% to 13,3% in teneral flies (Figure 2).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The manuscript is improved, but the author has not addressed much of the constructive criticism offered that would benefit the manuscript.

      To clarify, evidence from Schuster et al did not demonstrate, rather it suggested. That is a major point of this paper - that the previous evidence presented had caveats. Terms such as demonstrate or prove are inappropriate in most biological contexts, unless evidence is without caveat.

      This specific comment was addressed in the revision and illustrated with new data.

      Differences between the strain clones, the cell culture conditions and/or the fly colony maintenance conditions could explain part of the differences in infection rates observed here as compared to the Schuster et al. study (1). However, the use of the lectin-inhibitory sugar N-acetyl-glucosamine to enhance infection rates in the latter study could be a more likely explanation. To assess this hypothesis, an additional experimental challenge was performed to compare infection rates in teneral versus adult flies, with or without N-acetyl-glucosamine supplement in an infective meal containing 10<sup>5</sup> slender parasites / ml (Figure 2). Whereas no infection was detected in adult flies, the N-acetyl-glucosamine supplementation of the infective meal led to an increase of the infection rates from 2,4% to 13,3% in teneral flies (Figure 2).

      Statements regarding teneral flies in the field are softened. Yet the referenced papers pertain more to commensurate coinfections rather than reduced immunocapacity of immature teneral flies in the field. This should be clarified.

      The limited immunocompetence of teneral flies has been extensively studied by the labs of S. Aksoy at Yale and M. Lehane at Liverpool. In the discussion, we provide key references from these two labs 19-22.

      The text remains convoluted to read with grammatical errors in places. For example, it is incorrect to begin a sentence with However. There are far too many run-on sentences in the manuscript that confuse this straightforward story.

      The revised text was improved as much as possible.

      All text requires grammatical refinement and softer claims unless additional experiments are undertaken.

      Reviewer #2 (Recommendations For The Authors):

      I continue to endorse the publication of this manuscript; however, I am somewhat disappointed by the authors' justifications for not conducting additional experiments or exploring other factors that might influence the infection phenotypes in the fly.

      This specific comment was addressed in the revision and illustrated with new data.

      Differences between the strain clones, the cell culture conditions and/or the fly colony maintenance conditions could explain part of the differences in infection rates observed here as compared to the Schuster et al. study (1). However, the use of the lectin-inhibitory sugar N-acetyl-glucosamine to enhance infection rates in the latter study could be a more likely explanation. To assess this hypothesis, an additional experimental challenge was performed to compare infection rates in teneral versus adult flies, with or without N-acetyl-glucosamine supplement in an infective meal containing 10<sup>5</sup> slender parasites / ml (Figure 2). Whereas no infection was detected in adult flies, the N-acetyl-glucosamine supplementation of the infective meal led to an increase of the infection rates from 2,4% to 13,3% in teneral flies (Figure 2).

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors performed an integration of 48 scRNA-seq public datasets and created a single-cell transcriptomic atlas for AML (222 samples comprising 748,679 cells). This is important since most AML scRNA-seq studies suffer from small sample size coupled with high heterogeneity. They used this atlas to further dissect AML with t(8;21) (AML-ETO/RUNX1-RUNX1T1), which is one of the most frequent AML subtypes in young people. In particular, they were able to predict Gene Regulatory Networks in this AML subtype using pySCENIC, which identified the paediatric regulon defined by a distinct group of hematopoietic transcription factors (TFs) and the adult regulon for t(8;21). They further validated this in bulk RNA-seq with AUCell algorithm and inferred prenatal signature to 5 key TFs (KDM5A, REST, BCLAF1, YY1, and RAD21), and the postnatal signature to 9 TFs (ENO1, TFDP1, MYBL2, KLF1, TAGLN2, KLF2, IRF7, SPI1, and YXB1). They also used SCENIC+ to identify enhancer-driven regulons (eRegulons), forming an eGRN, and found that prenatal origin shows a specific HSC eRegulon profile, while a postnatal origin shows a GMP profile. They also did an in silico perturbation and found AP-1 complex (JUN, ATF4, FOSL2), P300, and BCLAF1 as important TFs to induce differentiation. Overall, I found this study very important in creating a comprehensive resource for AML research.

      Strengths:

      (1) The generation of an AML atlas integrating multiple datasets with almost 750K cells will further support the community working on AML.

      (2) Characterisation of t(8;21) AML proposes new interesting leads.

      We thank the reviewer for a succinct summary of our work and highlighting its strengths.

      Weaknesses:

      Were these t(8;21) TFs/regulons identified from any of the single datasets? For example, if the authors apply pySCENIC to any dataset, would they find the same TFs, or is it the increase in the number of cells that allows identification of these?

      The purpose of our study was to gain biological insights by integrating multiple datasets, to overcome limitations from small sample size. We expect that the larger dataset would improve network inference, which is what we implemented in the manuscript, hence we have not looked at individual datasets. However, we will investigate this further in the revised manuscript by running pySCENIC on individual datasets and comparing to the results drawn from the whole atlas.

      Reviewer #2 (Public review):

      Summary:

      The authors assemble 222 publicly available bone marrow single-cell RNA sequencing samples from healthy donors and primary AML, including pediatric, adolescent, and adult patients at diagnosis. Focusing on one specific subtype, t(8;21), which, despite affecting all age classes, is associated with better prognosis and drug response for younger patients, the authors investigate if this difference is reflected also in the transcriptomic signal. Specifically, they hypothesize that the pediatric and part of the young population acquires leukemic mutations in utero, which leads to a different leukemogenic transformation and ultimately to differently regulated leukemic stem cells with respect to the adult counterpart. The analysis in this work heavily relies on regulatory network inference and clustering (via SCENIC tools), which identifies regulatory modules believed to distinguish the pre-, respectively, post-natal leukemic transformation. Bulk RNA-seq and scATAC-seq datasets displaying the same signatures are subsequently used for extending the pool of putative signature-specific TFs and enhancer elements. Through gene set enrichment, ontology, and perturbation simulation, the authors aim to interpret the regulatory signatures and translate them into potential onset-specific therapeutic targets. The putative pre-natal signature is associated with increased chemosensitivity, RNA splicing, histone modification, stem-ness marker SMARCA2, and potentially maintained by EP300 and BCLAF1.

      Strengths:

      The main strength of this work is the compilation of a pediatric AML atlas using the efficient Cellxgene interface. Also, the idea of identifying markers for different disease onsets, interpreting them from a developmental angle, and connecting this to the different therapy and relapse observations, is interesting. The results obtained, the set of putative up-regulated TFs, are biologically coherent with the mechanisms and the conclusions drawn. I also appreciate that the analysis code was made available and is well documented.

      We thank the reviewer for reviewing our work, and highlighting its key features, including creation of AML atlas, downstream analysis and interpretation for t(8;21) subtype.

      We also appreciate useful critique of our paper provided below.

      Weaknesses:

      There were fundamental flaws in how methods and samples were applied, a general lack of critical examination of both the results and the appropriateness of the methods for the data at hand, and in how results were presented. In particular:

      (1) Cell type annotation:

      a) The 2-phase cell type annotation process employed for the scRNA-seq sample collection raised concerns. Initially annotated cells are re-labeled after a second round with the same cell types from the initial label pool (Figure 1E). The automatic annotation tools were used without specifying the database and tissue atlases used as a reference, and no information was shown regarding the consensus across these tools.

      We believe that most of the reviewer’s criticisms stem from a misunderstanding, and we apologize for not explaining certain aspects of our work more clearly.

      The two types of cell type annotation applied were different and served distinct purposes:

      • One was using general bone marrow/blood reference datasets to annotate blood subtype lineage clusters.

      • The other was using a CD34 purified AML specific reference dataset which included leukaemia-associated annotations, to identify HSPC subpopulations. We also implemented this on a single-cell level to allow more robust identification of these rare populations in a large dataset.

      This is probably not well explained in the methods and figure presentation. We will clearly indicate in the revised manuscript that different HSPC annotations represent separate analysis and will update the figures to highlight this. We will provide a comprehensive review of the annotation strategies implemented, including the automated tool outputs, which may be useful for the single-cell community.

      b) Expression of the CD34 marker is only reported as a selection method for HSPCs, which is not in line with common practice. The use of only is admitted as a surface marker, while robust annotation of HSPCs should be done on the basis of expression of gene sets.

      We used CD34 expression in conjunction with other cell type annotations and marker sets to identify LSCs, although results are same when we use HSPC annotated cells without condition on CD34 expression.  In the revised manuscript, we will simplify this analysis to use HSPC clusters as suggested by the reviewer.

      c) During several analyses, the cell types used were either not well defined or contradictory, such as in Figure 2D, where it is not clear if pySCENIC and AUC scores were computed on HSPCs alone or merged with CMPs. In other cases, different cell type populations are compared and used interchangeably: comparing the HSPC-derived regulons with bulk (probably not enriched for CD34+ cells) RNA samples could be an issue if there are no valid assumptions on the cell composition of the bulk sample.

      As mentioned in the Methods, we only excluded lymphoid cell types from the pySCENIC analysis to overcome the bias that some samples were enriched using CD34 selection when preparing them for scRNA-seq. We will make this clearer in the text and figures of the revised manuscript. It is difficult to overcome this bias when using bulk RNA samples, which may explain why some of our samples do not fit into our defined signature groups. However, as we do not have access to primary samples ourselves, we cannot provide a better matched experimental cohort for validation.

      (2) Method selection:

      a) The authors should explain why they use pySCENIC and not any other approach. They should briefly explain how pySCENIC works and what they get out in the main text. In addition they should explain the AUCell algorithm and motivate its usage.

      pySCENIC is state-of-the-art method for network inference from scRNA data and is widely used within the single-cell community (over 5000 citations for both versions of the SCENIC pipeline). The pipeline has been benchmarked as one of the top performers for GRN analysis (Nguyen et al, 2021. Briefings in Bioinformatics). AUCELL is a module within the pySCENIC pipeline to summarise the activity of a set of genes (a regulon) into a single number which helps compare and visualise different regulons. We agree with reviewer that this could have been more clearly explained within the manuscript. We will update text in the revised manuscript to add more explanation.

      b) The obtained GRN signatures were not critically challenged on an external dataset. Therefore, the evidence that supports these signatures to be reliable and significant to the investigated setting is weak.

      These signatures were inferred from the best suitable AML single-cell RNA datasets available to date, and we used two independent datasets to validate our findings (the TARGET AML bulk RNA sequencing cohort, and the Lambo et al. scRNA-seq dataset). To our knowledge, there are no other better suited datasets for validation. Experimental validations on patient samples are beyond the scope of this study.

      (3) There are some issues with the analysis & visualization of the data.

      We will provide new statistical tests to improve robustness of the analysis as well as presentation and visualization of the data in the revised manuscript.

      (4) Discussion:

      a) What exactly is the 'regulon signature' that the authors infer? How can it be useful for insights into disease mechanisms?

      The ’regulon signature’ here refers to a gene regulatory program (multiple gene modules, each defined by a transcription factor and its targets) which are specific to different age groups. Further investigation into this can be useful for understanding why patients of different ages confer a different clinical course. We will add more text on the utility of our discovered 'regulon signature' in the discussion section of revised manuscript.

      b) The authors write 'Together this indicates that EP300 inhibition may be particularly effective in t(8;21) AML, and that BCLAF1 may present a new therapeutic target for t(8;21) AML, particularly in children with inferred pre-natal origin of the driver translocation.' I am missing a critical discussion of what is needed to further test the two targets. Put differently: Would the authors take the risk of a clinical study given the evidence from their analysis?

      Of course, many extensive studies would be required before these findings are clinically translatable. We can include some perspectives on what further work is required in terms of further experimental validation and potential subsequent clinical study.

    1. Author response:

      The following is the authors’ response to the original reviews

      Thank you for your valuable comments, which helped us improve our manuscript. We will make the following modifications in the revised manuscript:

      (1) In the first paragraph of the Result section, we will provide a summary of trimeric G proteins in Ciona and explain how we focused on Gαs and Gαq in the initial phase of this study.

      We added a summary of trimeric G proteins in Ciona in the initial part of the Results section (page 6, line 23 to page 8, line 5). In this summary, we added the following sentence explaining the reason we focused on Gas and Gaq in the initial phase of this study: "Among them, we prioritized examining the Gα proteins having an excitatory function (Gαq and Gαs) rather than inhibitory roles since previous studies suggested that excitatory events like Ca<sup>2+</sup> transient and neuropeptide secretion occur when Ciona metamorphose."

      (2) As the reviewer 1 suggests, the polymodal roles of papilla neurons are interesting. Although we could not address this through functional analyses in this study, we will add a discussion regarding this aspect. The sentences will be something like the following:

      “The recent study (Hoyer et al., 2024) provided several lines of evidence suggesting that PSNs can serve as the sensors of several chemicals in addition to the mechanical stimuli. This finding and our model could be mutually related because these chemicals could modify Ca<sup>2+</sup> and cAMP production. The use of G protein signaling allows Ciona to reflect various environmental stimuli to initiate metamorphosis in the appropriate situation, both mechanically and chemically.”

      We added a discussion related to the recent publication by Hoyer and colleagues on page 23, lines 13-18: " A recent study[19] provided several lines of evidence suggesting that PNs can serve as the sensors of several chemicals in addition to mechanical stimuli. This finding and our model could be mutually related because these chemicals could modify Ca<sup>2+</sup> and cAMP production. G protein signaling allows Ciona to reflect various environmental stimuli to initiate metamorphosis either mechanically or chemically according to the situation."

      (3) As both reviewers suggested, imaging cAMP on the backgrounds of some G protein knockdowns is essential, and we will conduct the experiments.

      We added the data on cAMP imaging in Gas, Gaq, and dvGai_Chr2 knockdown larvae in Supplementary Figure S4C-D and Figure 6E.

      (4) We carefully modify the text throughout the manuscript so that the descriptions suitably reflect the results.

      We modified the descriptions of experimental results so that the text reflects the results more precisely.

      Reviewer #1:

      Pg1 - need to add an additional '6' to the author list to clarify which two or more authors contributed equally.

      We added a 6 as suggested. Thank you for pointing this out.

      Pg3 - note that larval adhesive organ applies to not all benthic adults, but to benthic sessile adults this makes it sound like the adhesive organ can trigger metamorphosis but has that been shown? In Ciona or others? Need to specify the role of cells secreting adhesive, vs sensory cells that trigger metamorphosis?

      We divided the corresponding sentence into two to clearly state that adhesion and triggering metamorphosis are related but could be different events. Moreover, we modified the sentence to state that physical contact is one example of a cue triggering metamorphosis. We then added another example of a factor triggering metamorphosis—i.e., chemicals from the organisms surrounding the adherence site (page 3, lines 16-20 of the revised version):

      "Many marine invertebrates exhibit a benthic lifestyle at the adult stage[4]. Their planktonic larvae have an adhesive organ that secretes adhesives and adheres to a substratum. The cues associated with the adhesion, such as the physical contact with the substratum and a chemical from organisms surrounding the adherence site, can trigger their metamorphosis."

      Pg 4 - although mechanosensation is the focus here, could there also be chemoreception/chemoreceptors involved in Ciona metamorphosis? For example, Hoyer et al. 2024 (Current Biology 34(6):1168-1182) concluded that some palp sensory neurons were multimodal and could be both chemo- and mechano-sensory.

      We added statements about this recent finding in the Introduction and Discussion sections. In the Introduction (page 4, lines 16-18), however, we also stated that a mechanical stimulus can trigger metamorphosis in the lab without the need to supply these chemicals. This is to emphasize that the mechanical stimulus is the focus of this study. In the Discussion, we added a statement that G-protein signaling could also be used to receive the chemical stimuli (page 23, lines 13-18).

      Pg 6 - Before starting functional characterizations, it would be useful to give an overview (table?) of the G proteins found in papillae, and what receptor they are suspected of binding to, or if this is completely unknown, and which downstream pathways they likely activate. That is, to show some results about which G proteins are found in Ciona, and which are found in papillae. In this way, it will make more sense for readers when the Gai is suddenly introduced later, following the sections of Gaq and Gas.

      Thank you for your idea to improve the readability of this manuscript. In the initial part of the Results section (page 6, line 22 to page 8, line 5), we added descriptions of the repertoire of trimeric G-proteins in Ciona, including phylogenetic analyses, and expression in the papillae based on RNA-seq data, followed by the reason why we initially focused on Gaq and Gas. The data are displayed in Supplementary Figure S1. The phylogenetic analyses were modified from those shown in Supplementary Figure S5 of the previous version. We also added the general downstream activities of Gas, Gai and Gaq in the Introduction section (page 6, lines 10-12). Considering the contents, the general function of Ga12/13 was stated in the Results section (page 8, lines 2-3).

      We did not add the information about their partner receptors in this early section. This is because there are many candidates, and we could not pick some of them. Instead, we described our current suppositions about their possible partners in the Discussion (page 23, line 22 to page 24, line 19). However, we suspect that there are more candidates, and we wish to promote unbiased research in the future.

      Pg 9 - would be good to know the timing of this PF fluorescence increase and the timing of stimulation in the text here, relevant to the 30-min gap before metamorphosis initiation

      We added the start times for the cAMP reduction and re-upregulation in the following sentence (page 11, lines 17-18): "The cAMP reduction and increase respectively started at 35 seconds and 4 min 40 seconds after stimulation on average."

      Pg 28 - Phylogenetic analysis: Given that the results may be of interest to metamorphosis in other marine invertebrates as discussed in the last paragraph of the paper, it would be useful to include G proteins from these other animal phyla where available in the phylogenetic tree. Similarly, in Figure S5A it would be useful to highlight further all the different Ciona G proteins, and the different protein families, through the use of additional colour/labelling (regardless of whether this remains Fig S5A, or becomes part of the main figures)

      We drew a phylogenetic tree of G-proteins including those in some sessile and benthic animals (barnacle, sea anemone, hydra, sponge, sea urchin and shell). However, we decided not to add the tree in the revised version because, unfortunately, the bootstrap values of many branches were not high enough to have confidence in the results. We hope you understand our decision. Ciona divergent G-proteins are likely to be specific to Ciona.

      According to your comment, we highlighted all Ciona G alpha proteins in red in Figure S5A, which is now Figure S1A in the revised version.

      Figure 3E and Figure S3 - is the data shown as an average of all larvae measured (n=5 and n=4) or is it data from one representative larva out of the 4-5 measured? This needs clarification.

      The original graphs in Figure 3E and Figure S3 are typical examples. We added the graphs summarizing data of all larvae in each experimental condition in Supplementary Figure S4 (corresponding to Supplementary Figure S3 of the original version). Figure 3E remains as a typical example of the result of a single larva to explain our data analysis in detail.

      Experimental suggestion - As mentioned above, one missing detail seems to be the need for evidence that cAMP is elevated in the papillae directly as a result of Gs activation- this could be shown with measurement of cAMP via PF in Gs knockdown larvae that are mechanically stimulated compared to wildtype stimulated and non-stimulated?

      Thank you for your suggestion. The experiments are indeed important. We added the data of Pink Flamindo imaging in the Gas, Gaq and dvGai_Chr2 knockdown conditions. The results of Gas and Gaq knockdowns are described in page 11, line 24 to page 12, line 5, and are displayed in Supplementary Figure S4C-D. The result of dvGai_Chr2 knockdown is given on page 16, lines 20-22 and shown in Figure 6E.

      In order to insert the data of cAMP imaging of dvGai_Chr2 knockdown larvae, we transferred some panels of Figure 6 to Supplementary Figure S6. In addition, the knockdown data of dvGαi_Chr4 and double knockdowns of Gai genes are also included in Supplementary Figure S6.

      Reviewer #2:

      Page 6, line 3-4 in the first paragraph of the "Results"; the authors state "Neither morphant showed any signature of metamorphosis even though both were allowed to adhere to the base of culture dishes...". However, judging from Fig. 1E, "the percentage of metamorphosis initiation" (indicated by the initiation of tail regression) in Gαq morphans is not close to 0 (average about 40%), thus I am not convinced this observation can be described as "Neither morphant showed any signature of metamorphosis..." in this sentence.

      Thank you for your suggestion. In writing the original text, we oversimplified some of the descriptions when trying to improve the readability. We agree this resulted in imprecision in places. We have revised all these passages in our revision. In this particular case, we softened the overly emphatic statement to better reflect the results, changing “... any signature of metamorphosis...” to “... reduced rate of metamorphosis initiation...” In addition, we stated that the effect of G_α_q MO was weaker than that of G_α_s MO on page 8, lines 10-12. The weaker effect of Gaq MO was due to the redundant role of the Gi pathway, which is shown on page 17, lines 10-17, and in Figure 6G-H.

      Similarly, in the next paragraph describing the knockdown of PLCβ1/2/3, PLCβ4, and IP3R genes, the authors appear to neglect there is a weaker effect of the PLCβ4 MO, and simply described the results as "The knockdown larvae of these three genes failed to start metamorphosis". Based on Fig. 1H, about 30% of the PLCβ4 MO-injected animals still initiated tail regeneration. This difference may have some biological meanings and thus should be described more precisely.

      We added the following sentence on page 8, lines 18-19 of the revised version: “The effect of PLCβ4 MO was weaker than those of the other MOs, suggesting that this PLC plays an auxiliary role.”

      Page 7, second paragraph, on the description of GCaMP8 fluorescence and also at the end of Fig. 1O legend, the citation to "Figure S1" is confusing; Fig. S1 is the phylogenetic tree of PLCβ proteins. Is there additional data regarding this Gαq MO plus GCaMP8 mRNA injection experiment?

      Figure S1 of the original version corresponds to Figure S2 of the revised version. To avoid confusion, we deleted this citation from the legend of Figure 1O. By this modification, the sentence stating the repertoire of PLCb and IP3R in Ciona (page 8, lines 15-16) is the only sentence citing Figure S2 in the revised version.

      Page 8, first sentence; The purpose of theophylline treatment is not to prevent larvae from adhesion, thus I would suggest modifying this sentence to: "We treated wild-type larvae with theophylline after tail amputation, and we observed that most theophylline-treated larvae completed tail regression without adhesion (Figure 2D-F)".

      We modified the sentence according to your comment. Thank you for your suggestion.

      Page 9, second paragraph; judging from the data presented in Fig. 3C, I think this description: "when papillae were removed from larvae, theophylline failed to induce metamorphosis" is not accurate, because about ~30% of the Papilla cut +Theophylline-treated larvae still initiated their tail regression. This needs to be explained clearly.

      We modified the sentence (page 11, lines 2-3) as follows: “...the average rate of metamorphosis induction by theophylline was reduced from 100% to 30%...”

      Similarly in the next few sentences regarding the results presented in Fig, 3D, the effects of overexpressing those genes are not uniform. While amputation of papillae in larvae overexpressing caPLCβ1/2/3 could inhibit metamorphosis almost completely, papilla cut seems to have a weaker effect on caGαq, caGαs, and bPAC-overexpressing larvae.

      We added a description explaining that caPLCβ1/2/3 was the most sensitive to papilla amputation, and the possibility that PLCβ1/2/3 works specifically in the papillae (page 11, lines 9-11): “Among these experiments, caPLCβ1/2/3 overexpression was the most sensitive to papilla amputation, suggesting that PLCβ1/2/3 acts specifically in the papillae during metamorphosis.”

      Page 9, the paragraph on using the fluorescent cAMP indicator; there is a discrepancy between the described developmental time when the authors conducted this experiment and the metamorphosis competent timing (after 24hpf) described on page 7. On page 26, the authors describe "The Pink Flamindo mRNA-injected larvae were immobilized on Poly L lysine-coated glass bottom dishes at 20-21 hpf...". Did the authors start stimulating the larvae to observe the fluorescent signal soon after immobilization, or wait several hours until the larvae passed 24hpf and then conduct the experiment?

      The latter is the case. The immobilized larvae were kept until they acquired the competence for metamorphosis and then stimulation/recording was carried out. This point is described in the Materials and Methods section of the revised version (page 29, lines 16-18):

      "The Pink Flamindo mRNA-injected larvae were immobilized on Poly L lysine-coated glass-bottom dishes at 20-21 hpf, and stimulated their adhesive papillae around 25 hpf."

      Page 10, the description "...Gαq morphants initiated metamorphosis when caGαs was overexpressed in the nervous system (Figure 4F)". It should be noted that the result is only a partial rescue. To be precise, this description needs to be modified.

      We changed the sentence to reflect the results more precisely (page 14, lines 2-3): “Moreover, caGαs overexpression in the nervous system significantly, although not perfectly, ameliorated the effect of Gαq MO (Figure 4F).”

      Page 12-13, This description and the figure 5E presented is a bit confusing to me. The figure legend for 5E: "GABA is necessary for Ca2+ transient in the adhesive papillae (arrow)" But the arrow in this image points to a place with no fluorescent signal, and on the upper corner it labeled as "29% (n=17)". Does that mean the proportion of "no Ca2+ increase after stimulation" was 29% among the 17 samples examined? Or actually, is the other way around that 81% of the examined larvae did not show Ca2+ signal increase after stimulation?

      The latter is the case. We added a caption explaining this clearly in the Figure legend: “The percentage and number exhibit the rate of animals showing Ca<sup>2+</sup> transient in the papillae.”

      Page 13, second paragraph; I do not agree with the overly simplified description that "GABA significantly ameliorated the metamorphosis-failed phenocopies of Gαq, PLCβ, and Gαs morphants". As shown in Fig. 5F-H, adding GABA exerts different levels of partial rescue effect on each morphant, and thus should be described clearly.

      When the outliers are neglected, the effect of GABA is most evident in Gαs knockdowns. This suggests that the target(s) of GABA signaling is more likely to be Gq pathway components. We added the following sentence to the revised version (page 15, lines 14-16):

      “Among the three morphants, GABA exhibited the most effective rescues in Gαs knockdowns than Gαq and PLCβ.”

      In addition, we think this sentence establishes a more logical connection with the sentence that follows it: “These results could be explained by assuming enhancement of the Gq pathway by GABA through PLCβ and another GABA-mediated metamorphic pathway bypassing Gq components.” Thank you for your suggestion.

      The section "Contribution of Gi to metamorphosis" confirmed the possibility that GABA signaling targets Gq pathway components.

      Page 13, the first paragraph on "Contribution of Gi to metamorphosis"; the description that "The knockdown of this gene (Gαi) exhibited a significantly reduced rate of metamorphosis;..." is misleading. I would suggest modifying the entire sentence as "The knockdown of this gene (Gαi) exhibited a moderate (although statistically significant) reduction of metamorphosis rate, suggesting the presence of another Gαi regulating metamorphosis".

      Thank you for your suggestion. We modified the sentence (page 16, lines 2-4 in the revised version) as recommended. We believe the description is much improved.

      Page 20, the last sentence about Ciona papilla neurons expressing transcription factor Islet; the authors seem to attempt to make some comparison with the vertebrate pancreatic beta cells in this paragraph, but the comparison and the argument are not fully developed in this current format.

      To deepen this discussion, we added the following sentence (page 23, lines 10-12): “The atypical secretion of GABA might depend on the transcription factor like Islet shared between Ciona papilla neurons and vertebrate beta cells.”

      However, we would like to limit the depth of our discussion on this point, as we hope to expand on it further in future studies.

      Other suggestions:

      Page 3, second paragraph: as they become unable to "move" after metamorphosis -> "relocate"

      We corrected the word as suggested.

      Page 4, second paragraph: In the first sentence, the author states the current understanding of chordate phylogeny and cites Delsuc et al. 2006 Nature paper at the end of this sentence. However, in this paper cephalochordates were erroneously grouped with echinoderms, and thus chordates did not form a monophyletic clade. A later paper by Bourlat et al, (Nature 444:85-88, 2006) corrected this problem, and subsequently Dulsuc et al. also published another paper (genesis, 46:592-604, 2008) with broader sampling to overcome this problem. These later publications need to be included for the sake of correctness.

      We added this reference.

      Page 14, regarding the redundant function of the typical Gαi protein in the papillae; the authors may try double KD of Gαi and dvGαi_Chr2 in their experimental system to test this idea.

      We carried out double knockdown of typical Gai and dvGαi_Chr2. However, we could not address their redundant role sufficiently because most of the double knockdown larvae exhibited severe shape malformation.

      dvGαi_Chr4 is also expressed in the papillae. We carried out knockdown of this gene, to find that the knockdown resulted in very minor but statistically significant reduction of the metamorphosis rate, suggesting that this Gai also plays a supportive role in metamorphosis. We also carried out double knockdown of dvGαi_Chr2 and dvGαi_Chr4. The double KD larvae exhibited responsiveness to GABA, probably because of the presence of typical Gai.

      These results are described on page 16, lines 2-18, and the data are shown in Supplementary Figure S6A-D of the revised version.

      Responses to the Reviewing editor's comments:

      "Larvae of the ascidian Ciona initiate metamorphosis tens of minutes after adhesion to a substratum via its adhesive organ." - Larvae is plural so change to 'via their adhesive organ'

      The sentence was corrected as suggested.

      "Metamorphosis is a widespread feature of animal development that allows them" - revise the sentence, e.g. "Metamorphosis is a widespread feature of development that allows animals"

      The sentence was corrected as suggested.

      "GABA synthase (GAD)" GAD is not called GABA synthase but glutamate decarboxylase - clarify, e.g. encoding the enzyme synthesizing GABA called glutamate decarboxylase (GAD)

      This part was corrected exactly as suggested. Thank you.

      "IP3 is received by its receptor on the endoplasmic reticulum (ER) and releases calcium ion (Ca2+ )" revise to "IP3 is received by its receptor on the endoplasmic reticulum (ER) that releases calcium ion (Ca2+ )"

      The sentence was corrected as suggested.

      "Moreover, GPCR is implicated as the mediator of settlement" - GPCRs are implicated

      This sentence was modified as suggested.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1(Public review):

      Summary:

      This manuscript details the results of a small pilot study of neoadjuvant radiotherapy followed by combination treatment with hormone therapy and dalpiciclib for early-stage HR+/HER2-negative breast cancer.

      Strengths:

      The strengths of the manuscript include the scientific rationale behind the approach and the inclusion of some simple translational studies.

      Weaknesses:

      The main weakness of the manuscript is that overly strong conclusions are made by the authors based on a very small study of twelve patients. A study this small is not powered to fully characterize the efficacy or safety of a treatment approach, and can, at best, demonstrate feasibility. These data need validation in a larger cohort before they can have any implications for clinical practice, and the treatment approach outlined should not yet be considered a true alternative to standard evidence-based approaches.

      I would urge the authors and readers to exercise caution when comparing results of this 12-patient pilot study to historical studies, many of which were much larger, and had different treatment protocols and baseline patient characteristics. Cross-trial comparisons like this are prone to mislead, even when comparing well powered studies. With such a small sample size, the risk of statistical error is very high, and comparisons like this have little meaning.

      We greatly appreciate your evaluation of our study and fully agree with the limitations you have pointed out. We have clearly stated the limitations of the small sample size and emphasized the need for a larger population to validate our preliminary findings in the discussion section (Lines 311-316).

      We acknowledge that this small sample size is not powered to characterize this regimen as a promising alternative regimen in the treatment of patients with HR-positive, HER2-negative breast cancer. Therefore, we have revised the description of this regimen to serve as a feasible option for neoadjuvant therapy in HR-positive, HER2-negative breast cancers both in the discussion (Lines 317-320) and the abstract (Lines 71-72).

      We agree with you that cross-trial comparisons should be approached with caution due to differences in study designs and patient populations. In our discussion section, we acknowledge that small sample size limited the comparison of our data with historical data in the literature due to the potential bias (Lines 312-313). We clearly state that such comparisons hold limited significance (Lines 313-314) and suggest a larger population to validate our preliminary findings.

      • Why was dalpiciclib chosen, as opposed to another CDK4/6 inhibitor?

      Thank you for your comments. The rationale for selecting dalpiciclib over other CDK4/6 inhibitors in our study is primarily based on the following considerations:

      (1) Clinical Efficacy: In several clinical trials, including DAWNA-1 and DAWNA-2, the combination of dalpiciclib with endocrine therapies such as fulvestrant, letrozole, or anastrozole has been shown to significantly extend the progression-free survival (PFS) in patients with hormone receptor-positive, HER2-negative advanced breast cancer [1-2].

      (2) Tolerability and Management of Adverse Reactions: The primary adverse reactions associated with dalpiciclib are neutropenia, leukopenia, and anemia. Despite these potential side effects, the majority of patients are able to tolerate them, and with proper monitoring and management, these reactions can be effectively mitigated [1-2].

      (3) Comparable pharmacodynamic with other CDK4/6 inhibitors: The combination of CDK4/6 inhibitors, including palbociclib, ribociclib, and abemaciclib, with aromatase inhibitors has demonstrated an enhanced ability to suppress tumor proliferation and increase the rate of clinical response in neoadjuvant therapy for HR-positive, HER2-negative breast cancer [3-5]. Furthermore, preclinical studies have shown that dalpiciclib has comparable in vivo and in vitro pharmacodynamic activity to palbociclib, suggesting its potential effectiveness in similar treatment regimens [6].

      (4) Accessibility and Regulatory Approval: Dalpiciclib has gained marketing approval in China on December 31, 2021, which facilitates the accessibility of this medication, making it a more convenient option when considering treatment plans.

      References:

      (1) Zhang P, Zhang Q, Tong Z, et al. Dalpiciclib plus letrozole or anastrozole versus placebo plus letrozole or anastrozole as first-line treatment in patients with hormone receptor-positive, HER2-negative advanced breast cancer (DAWNA-2): a multicentre, randomised, double-blind, placebo-controlled, phase 3 trial[J]. The Lancet Oncology, 2023, 24(6): 646-657.

      (2) Xu B, Zhang Q, Zhang P, et al. Dalpiciclib or placebo plus fulvestrant in hormone receptor-positive and HER2-negative advanced breast cancer: a randomized, phase 3 trial[J]. Nature medicine, 2021, 27(11): 1904-1909.

      (3) Hurvitz S A, Martin M, Press M F, et al. Potent cell-cycle inhibition and upregulation of immune response with abemaciclib and anastrozole in neoMONARCH, phase II neoadjuvant study in HR+/HER2− breast cancer[J]. Clinical Cancer Research, 2020, 26(3): 566-580.

      (4) Prat A, Saura C, Pascual T, et al. Ribociclib plus letrozole versus chemotherapy for postmenopausal women with hormone receptor-positive, HER2-negative, luminal B breast cancer (CORALLEEN): an open-label, multicentre, randomised, phase 2 trial[J]. The lancet oncology, 2020, 21(1): 33-43.

      (5) Ma C X, Gao F, Luo J, et al. NeoPalAna: neoadjuvant palbociclib, a cyclin-dependent kinase 4/6 inhibitor, and anastrozole for clinical stage 2 or 3 estrogen receptor–positive breast cancer[J]. Clinical Cancer Research, 2017, 23(15): 4055-4065.

      (6) Long F, He Y, Fu H, et al. Preclinical characterization of SHR6390, a novel CDK 4/6 inhibitor, in vitro and in human tumor xenograft models[J]. Cancer science, 2019, 110(4): 1420-1430.

      • The eligibility criteria are not consistent throughout the manuscript, sometimes saying early breast cancer, other times saying stage II/III by MRI criteria.

      Thank you for pointing out the inconsistencies in the description of the eligibility criteria in our manuscript. We deeply apologize for any confusion caused by these inconsistencies. We have revised the term from “early-stage HR-positive, HER2-negative breast cancer” to “early or locally advanced HR-positive, HER2-negative breast cancer” (Lines 128 and 150). The term “early or locally advanced” encompasses two different stages of breast cancer, whereas “Stage II/III by MRI criteria” refers to specific stages within the TNM staging system.

      • The authors should emphasize the 25% rate of conversion from mastectomy to breast conservation and also report the type and nature of axillary lymph node surgery performed. As the authors note in the discussion section, rates of pathologic complete response/RCB scores are less prognostic for hormone-receptor-positive breast cancer than other subtypes, so one of the main rationales for neoadjuvant medical therapy is for surgical downstaging. This is a clinically relevant outcome.

      We appreciate your constructive comments. Based on your suggestions, we have made the following revisions and additions to the article.

      The breast conservation rate serves as a secondary endpoint in our study (Line 62 and 179). We have highlighted the significant 25% conversion rate from mastectomy to breast conservation in both the results (Lines 229-230) and discussion sections (Lines 290-292).

      In our study, all patients underwent lymph node surgery, including sentinel lymph node biopsy or axillary lymph node dissection. Among them, 58.3% of patients (7/12) underwent sentinel lymph node biopsies.

      We agree with your point that the prognostic value of pathologic complete response/RCB score is lower for hormone receptor-positive breast cancer compared to other subtypes, we have revised the discussion section to clarify that one of the principal objectives for neoadjuvant therapy in this patient population is to facilitate downstaging and enhance the rate of breast conservation (Lines 289-290). And also emphasized that this neoadjuvant therapeutic regiment appeared to improve the likelihood of pathological downstaging and achieve a margin-free resection, particularly for those with locally advanced and high-risk breast cancer (Lines 293-295).

      Reviewer #2 (Public review):

      Firstly, as this is a single-arm preliminary study, we are curious about the order of radiotherapy and the endocrine therapy. Besides, considering the radiotherapy, we also concern about the recovery of the wound after the surgery and whether related data were collected.

      Thanks for the comments. The treatment sequence in this study is to first administer radiotherapy, followed by endocrine therapy. A meta-analysis has indicated that concurrent radiotherapy with endocrine therapy does not significantly impact the incidence of radiation-induced toxicity or survival rates compared to a sequential approach [1]. In light of preclinical research suggesting enhanced therapeutic efficacy when radiotherapy is delivered prior to CDK4/6 inhibitors, we have opted to administer radiotherapy before the combination therapy of CDK4/6 inhibitors and hormone therapy [2].

      In our study, we collected data on surgical wound recovery. All 12 patients had Class I incisions, which healed by primary intention. The wounds exhibited no signs of redness, swelling, exudate, or fat necrosis.

      References:

      (1) Li Y F, Chang L, Li W H, et al. Radiotherapy concurrent versus sequential with endocrine therapy in breast cancer: A meta-analysis[J]. The Breast, 2016, 27: 93-98.

      (2) Petroni G, Buqué A, Yamazaki T, et al. Radiotherapy delivered before CDK4/6 inhibitors mediates superior therapeutic effects in ER+ breast cancer[J]. Clinical Cancer Research, 2021, 27(7): 1855-1863.

      Secondly, in the methodology, please describe the sample size estimation of this study and follow up details.

      Thanks for pointing out this crucial omission. Sample size estimation for this study and follow-up details have been added in the methodology section. The section on sample size estimation has been revised to state in Statistical analysis: “This exploratory study involves 12 patients, with the sample size determined based on clinical considerations, not statistical factors (Lines 210-211).” The section on follow up has been revised to state in Procedures section “A 5-year follow-up is conducted every 3 months during the first 2 years, and every 6 months for the subsequent 3 years. Additionally, safety data are collected within 90 days after surgery for subjects who discontinue study treatment (Lines 169-172).”

      Thirdly, in Table 1, the item HER2 expression, it's better to categorise HER2 into 0, 1+, 2+ and FISH-.

      Thank you very much for pointing out this issue. The item HER2 expression in Table 1 has been revised from “negative, 1+, 2+ and FISH-” to “0, 1+, 2+ and FISH-”.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      I can find no problems with the experiments performed in this study, but there are several results that are not easily explained. I would like to see more consideration of possible explanations. For example, one of the major differences between the the CESA structure from primary and secondary cell walls is the displacement of TM7 in the primary cell wall CESAs that leads to the formation of lipid exposed channel. Why does this vary between primary and secondary cell wall CESA proteins? Could it explain differences in the properties, such as crystallinity between primary and secondary cell wall cellulose?

      At this time, the different position of TM helix 7 observed in our GmCesA structures is just an observation. We have some emerging evidence that this helix is also flexible in POCesA8 under certain conditions; however, we do not know whether this affects catalytic activity or cellulose coalescence. We have revised the text to avoid the interpretation that TM 7 repositioning is a characteristic feature of primary cell wall CesAs only.

      Similarly, regarding the formation of the larger structures from mixtures of different CESA trimers. Why do they not form roseOes? Par;cularly as these appear to be forming 2-dimensional structures.

      We have included additional data on the interaction between different CesA isoform trimers (Figure 6). To answer the reviewer’s ques;on, the most likely reasons for not observing closely packed roseOe-like structures are (a) steric interferences between the micelles harboring the individual CesA trimers, and (b) the lack of a stabilizing cellulose fiber.  This interpretation is supported by 2D class averages of dimers of CesA1 and CesA3 trimers (now shown in Fig. 6). The class averages show an ‘upside-down and side-by-side’ orientation of the two trimers, consistent with interferences between the solubilizing detergent micelles. The implica;ons of this non-physiological arrangement are discussed in the revised manuscript. In a biological membrane, the CesA trimers are confined to the same plane in the same orientation, which is likely necessary to form ordered arrangements.

      What role does the NTD play in trimer formation given its apparent very high class specificity?

      We have no data suggesting any contribution of the NTD to trimer formation. Recent work on moss CesA5 and similar AlphaFold predic;ons suggest that, for some CesAs, an extreme Nterminal region can interact with the beta sheet of the catalytic domain via beta-strand augmentation. Whether this interaction can contribute to CesA-CesA interactions remains unknown.

      Reviewer #2 (Recommendations For The Authors):

      The authors provide PDB codes but not EMDB codes for the EM maps, also I would encourage the authors to upload the raw micrographs to the EMPIAR database.

      The EMDB codes are shown in Table 1 and data transfer to EMPIAR is ongoing.  

      Page 6 line 144, the statement "All CesA isoforms show greatest catalytic activity at neutral pH" seems to contradict the data in Figure 1e and the subsequent statements. This sentence should be removed.

      The text has been revised to indicate that CesA1 and CesA6 show highest activity under mild alkaline conditions.  

      Page 6, line 150, the authors state "The affinities for substrate binding range from 1.4 mM for CesA1 to 0.6 and 2.4 mM for CesA3 and CesA6, respectively." How were the affinities determined? Is this the affinities or the Michaelis constants? Is it known whether CesAs are rapid equilibrium enzymes? This should be clarified.

      The text now states that we performed Michaelis Menten kine;cs using the ‘UDP-Glo’ glycosyltransferase assay kit. We are uncertain about whether CesAs can be classified as rapid equilibrium enzymes. The rate-limiting step of cellulose biosynthesis has been proposed to be glycosyl transfer, rather than cellulose transloca;on.  To avoid any confusion, we changed the text from '…reveals Michaelis Menten constants for substrate binding of CesA1 and CesA3' to '…reveals Michaelis Menten constants for CesA1 and CesA3 with respect to UDP-Glc'.

      Page 6, line 153, the authors state "CesA1's apparent Ki for UDP is roughly 0.8 mM, whereas this concentration is increased to about 1.2 to 1.5 mM for CesA6 and CesA3, respectively." From the Figure 1g legend, it appears that the authors performed additional experiments at different UDP-Glc concentrations in order to determine Ki that are not shown. This data should be included as a figure supplement as the data presented are insufficient to determine Ki (only IC50).

      The UDP inhibition data show apparent IC50 values, and this has been corrected in the text. For each CesA isoform, the titration was done at one UDP-Glc concentration only.    

      Page 8, line 202, the authors state that TM helix 7 of the primary cell wall CesAs is more flexible "as evidenced by weaker density." The density for the TM helix 7 should be shown. If the density shown in Supplementary Figure 3 corresponds to TM helices the number of the helices should be indicated as it is not immediately obvious from the amino acid residue numbers.

      The densities for TM helix 7 of all CesA isoforms are shown in Supplemental Figure 3. The helices are now labeled to orient the reader.  

      Reviewer #2 (Public Review)

      The authors demonstrate via truncation that the N-terminus of the CesA is not involved in the interactions between the isoforms and propose that the CSR hook-like extensions are the primary mediator of trimer-trimer interactions. This argument would be strengthened by equivalent truncation experiments in which the CSR region is removed.

      We performed the suggested experiment. We replaced the CSR in N-terminally truncated GmCesA1 and GmCesA3 with a 20-residue long linker. The resulting constructs assemble into homotrimeric complexes as observed for the wild type and only N-terminally truncated versions. However, the CSR-truncated constructs of the different isoforms do not interact with each other in vitro. Further, CSR-deleted GmCesA3 also does not interact with full-length CesA1, suggesting that two CSR domains of different isoforms are necessary for homotrimer interaction. This data is now shown as Fig. 5.  

      Reviewer #3 (Recommendations For The Authors):

      Major Points

      (1) The authors state on Line 354 that they were unable to isolate heterotrimers, but they need to provide the data to support this claim; for example, it is important for readers to understand whether co-expression of all three CESAs leads to only homotrimers or only monomers. This information is essential to exclude model C in Figure 6.

      We have revised the corresponding discussion and toned down the statement that heterotrimeric complexes did not form in our recombinant expression system. Co-expression of differently tagged secondary or primary cell wall CesAs in Sf9 cells has consistently resulted in negligible amounts of material that can be purified sequentially over different affinity matrices (corresponding to the tags on the recombinantly expressed CesAs – His, Strep, Flag). While this does not exclude the formation of a small fraction of hetero-oligomeric complexes (which could be trimers as observed in the structures or monomers interacting via their CSR regions), it demonstrates that CesAs favor the same isoform for trimer formation, rather than partnering with other isoforms. An example of such a purification is now shown as Supplemental Figure 8.

      Determining whether heterotrimers are formed upon co-expression of different CesA isoforms requires high resolution structural analysis because co-purification of different isoforms can also be due to interactions between different homo-trimeric complexes, as demonstrated in this study.

      While we cannot exclude that factors exist in planta that may prevent the formation of homotrimers and favor the formation of hetero-trimers, it is important to keep in mind that currently no experimental data supports the formation of hetero-trimeric complexes. Instead, our work demonstrates that existing data on CesA isoform interactions can be explained by the interaction of homotrimers of different isoforms.

      (2) The evidence that the products of GmCEA1, GmCESA3, and GmCESA6 homotrimers are cellulose is that they consume UDP-glucose and produce a beta-glucanase-sensitive product. Other beta-glucans synthesized by similar GT2 family proteins (e.g. CSLDs, Yang et al., 2020 Plant Cell or CSLCs, Kim et al., 2020 PNAS) would be sensitive to this enzyme, and the product cannot truly be called cellulose unless it forms microfibrils. Previous reports of CESA activity in vitro have demonstrated that the products form genuine cellulose microfibrils rather than amorphous beta-glucan (via electron microscopy); extensively documented that the product is sensitive to beta-glucanase, but not other enzymes (e.g., callose or MLG degrading enzymes); provided linkage analysis of the product to conclusively demonstrate that it is a beta1,4-linked glucan; and documented a loss of activity when key catalytic residues were mutated (Purushotham et al., 2016 PNAS; Cho et al., 2017 Plant Phys; Purushotham et al., 2020 Science).

      Other GT2 characterization efforts have documented activity to similar standards (e.g. CSLDs, Yang et al., 2020 Plant Cell or CSLFs, Purushotham et al., 2022 Science Advances). At least one independent method should be provided, and the TEM of the product is necessary for readers to appreciate whether the product forms true cellulose microfibrils.

      There may be some confusion regarding the nomenclature. Therefore, we revised the second sentence of the Introduction to define ‘cellulose’ as a beta-1,4 linked glucose polymer, in accordance with the ‘Essentials of Glycobiology’. This is also consistent with enzyme nomenclature as the primary product of cellulose synthase is a single glucose polymer, and not a fibril. For example, most bacterial cellulose synthases only produce amorphous (single chain) cellulose. 

      We show that the GmCesA products can be degraded with a beta-1,4 specific glucanase (cellulase), which demonstrates the formation of authentic cellulose. This study does not focus on the formation of fibrillar cellulose apart from suggesting a revised model for a microfibrilforming CSC.       

      (3) The position of isoxaben-resistant mutations implies that primary cell wall CESAs form heterotrimers (Shim et al., 2018 Frontiers in Plant Biology). Indeed, in their previous description of the POCESA8 structure (Purushotham et al., 2020 Science), the authors discussed the position of isoxaben-resistant mutations as a way to justify the way that TM7 of one CESA can contribute to forming the cellulose translocation pore in the neighbouring CESA within a heterotrimer. However, in this manuscript, the authors document a different location for TM7 in the GmCEA1, GmCESA3, and GmCESA6 homotrimers, which would change the position of these resistance mutations. Please discuss.

      As stated in the manuscript, we do not know what the functional implication of the TM7 flexibility may be, but we speculate that it could affect the alignment of the synthesized cellulose polymers. Regarding the previously reported POCesA8 structure, the mapping of one of the reported isoxaben resistance mutants to the C-terminus of TM7 was not used to justify the structure; the structure with its position of TM7 stands on its own.  Considering recent observations suggesting that isoxaben may affect cellulose biosynthesis via secondary effects, we prefer not to speculate on the mechanism by which these mutations cause the apparent resistance to isoxaben (PMID: 37823413).

      (4) The authors present no evidence that GmCESA1/3/6 are involved in primary cell wall synthesis. Please include gene expression information (documenting widespread expression consistent with primary CESAs) and rigorous molecular phylogenetic analysis (or references to these published data) to clarify that these are indeed primary cell wall CESAs.

      This has been addressed. We have included additional figures (Fig. 1 and S1B) that show the strong and wide distribution of the selected CesAs in soybean leaves, their co-expression with primary cell wall markers, and their phylogenetic clustering with Arabidopsis primary cell wall CesAs.  

      (5) Several small changes need to be made to the abstract to ensure that it aligns with the data: Line 28: add "in vitro" arer "their assembly into homotrimeric complexes" Line 28: change "stabilized by the PCR" to "presumably stabilized by the PCR".

      We inserted ‘in vitro’ as requested. We did not insert the second modification as requested since CesA trimers are stabilized by the PCR. This is a fact arising from several experimentally determined CesA trimer structures.  

      (6) In all graphs in all figures it is unclear what the sample size is and what the bars represent. These must be stated in the figure legends. It is best practice to plot individual data points so that readers can easily interpret both the sample size and the variation.

      The sample sizes and error bars are now defined in the relevant figure legends.

      (7) The methods need to unambiguously define GmCESA1, GmCESA3, GmCESA6 protein identities using appropriate accession numbers.

      The accession codes are now provided in the Methods.

      Minor Points

      (1) Does CESA1 have higher activity in Figure 1D because of the pH at which the assay was conducted (see Figure 1E)? Could this difference in activity or pH preference have also affected their capacity to resolve TM7 of CESA1?

      We consistently observe higher in vitro catalytic activity of CesA1, compared to CesA3 and CesA6. Activity assays are performed at a pH of 7.5, roughly halfway between the activity maxima of CesA3 and CesA1/6. At this pH, we expect activity differences to arise from factors other than the buffer pH. As detailed above, we do not know whether the conformational flexibility of TM helix 7 affects catalytic activity.

      (2) Line 55: The authors should cite additional papers that also provide insight into CESA structure (e.g. Qiao et al 2021 PNAS).

      A recent publication on moss CesA5 has been included. Qiao et al unfortunately report on a dimeric assembly of a fragment of Arabidopsis thaliana’s CesA3 catalytic domain, which we consider non-physiological. We added a brief statement in the Discussion explaining that our GmCesA3 structure is inconsistent with the dimeric arrangement reported by Qiao et al.

      (3) Line 95: these references are about secondary cell wall CESA isoforms, but there are more appropriate references for the primary CESAs that should be included in place of these papers.

      Fagard et al report on growth defects in roots and dark-grown hypocotyls linked to Arabidopsis CesA 1 and CesA6, which are primary cell wall CesAs. Nevertheless, we have included two additional recent publications from the Meyerowitz and Persson labs.

      (4) Line 121-122: Please cite a specific figure that supports this claim, since the (Purushotham et al., 2020) reference refers to POCESA8 enrichment results, but the claims are about the GmCESA1/3/6 enrichment.

      The POCesA8 reference has been removed. The classification into monomers and trimers arises from the data processing described in this manuscript and is consistent with similar results obtained for POCesA8.

      (5) Line 314: It is more appropriate to use "enzyme activity" rather than "cellulose synthesis".

      We prefer to use cellulose biosynthesis since the enzyme produces cellulose.

      (6) Figure 1: please add colour to the graphs to clarify which trend lines belong to which data series (especially Figure 1G).

      The figure (now Fig. 2) has been revised as suggested.  

      (7) Figure 2D: It's not clear which parts are GmCESA and which are POCESA8; please clarify the figure legend.

      Thank you, the legend has been revised accordingly (now Fig. 3).

      (8) In Figure 5, It's not clear that the one CESA is maintained at a steady concentration throughout the assay since there is only a bar for that CESA at the highest concentration (e.g. in Figure 5A, the blue bar for CESA1 only appears on the right-most assay, but there was CESA1 in all assays, so this should be indicated).

      In the panel the reviewer is referring to, the blue bar corresponds to the activity measured for only CesA1 at a concentration of 20 µM. The red columns (indicated as ‘Mix’) represent the activities measured in the presence of 20 µM of CesA1 plus increasing concentrations of CesA3. The purple columns represent activities obtained for only CesA3 at the indicated concentrations. Numerical addition of the activities of CesA1 alone at 20 µM (blue column) and CesA 3 alone (purple columns) gives rise to the gray columns, now indicated by a capital ‘sigma’ sign. We are unclear on how the figure could be improved, but we have revised the legend to avoid confusion.    

      (9) Figure 5 legend needs to be clarified to indicate whether monomers or homotrimers were used in the assays.

      This is now shown as Fig. 7 and the legend has been revised as requested. The experiments were performed with the trimeric CesA fractions.

      (10) There seem to be some random dots near the top of Figures 6B & 6C

      Removed. Thank you.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Recommendations for the authors):

      We appreciate the reviewers' thoughtful comments and suggestions. Below, we provide point-by-point responses to the recommendations and outline the updates made to the manuscript.

      (1) Discussion, "the obvious experiment is to manipulate a neuron's anatomical embedding while leaving stimulus information intact."] The epiphenomenon can arise from the placement and types of a neuron's neurotransmitters and neuromodulators, too.

      The content of vesicles released by a neuron is obviously of great importance in determining postsynaptic impact. However, we’re suggesting that (assuming vesicular content is held constant) the anatomically-relevant patterning of spiking might additionally affect the postsynaptic neuron’s integration of the presynaptic input. To avoid confusion, we updated the text accordingly: “the obvious experiment is to manipulate a neuron's anatomical embedding while minimally impacting external and internal variables, such as stimulus information and levels of neurotransmitters or neuromodulators” (Line 594 - 596).

      (2) “In all conditions, the slope of the input duration versus sensitivity line was still positive at 1,800 seconds (Fig. 3B)". This may suggest that the estimate of the calculated statistics (ISI, PSTH) is more reliable with more data, rather than (or in addition to) specific information being extracted from faraway time points. Another potential confound is the training statistics were calculated from all training data, so the test data is a better match to training data when test statistics are calculated from more data. Overall, the validity of the conclusions following this observation is not clear to me.

      This is a great point. Accordingly, we revised the text to include this possibility: “Because the training data were of similar duration, this could be explained by either of two possibilities. First, the signal is relatively short, but noisy—in this case, extended sampling will increase reliability. Second, the anatomical signal is, itself, distributed over time scales of tens to hundreds of seconds.” (Line 252 - 255).

      (3) "This further suggests that there is a latent neural code for anatomical location embedded within the spike train, a feature that could be practically applied to determining the brain region of a recording electrode without the need for post-hoc histology". The performance of the model at the subregion level, which is a typical level of desired precision in locating cells, does not seem to support such a practical application. Please clarify to avoid confusion.

      The current model should not be considered a replacement for traditional methods, such as histology. Our intention is to convey that, with the inclusion of multimodal data and additional samples, a computational approach to anatomical localization has great promise. We updated the manuscript to clarify this point: “While significantly above chance, the structure-level model still lacks the accuracy for immediate practical application. However, it is highly likely that the incorporation of datasets with diverse multi-modal features and alternative regions from other research groups will increase the accuracy of such a model. In addition, a computational approach can be combined with other methods of anatomical reconstruction.” (Line 355 - 359).

      Additionally, we directly addressed this point in our original manuscript (Discussion section: Line 498 - 505 in the current version). Furthermore, following the release of our preprint, independent efforts have adopted a multimodal strategy with qualitatively similar results (Yu et al., 2024). Other recent work expands on the idea of utilizing single-neuron features for brain region/structure characterization (La Merre et al., 2024).

      Yu, H., Lyu, H., Xu, E. Y., Windolf, C., Lee, E. K., Yang, F., ... & Hurwitz, C. (2024). In vivo cell-type and brain region classification via multimodal contrastive learning. bioRxiv, 2024-11.

      Le Merre, P., Heining, K., Slashcheva, M., Jung, F., Moysiadou, E., Guyon, N., ... & Carlén, M. (2024). A Prefrontal Cortex Map based on Single Neuron Activity. bioRxiv, 2024-11.

      (4) "These results support the notion the meaningful computational division in murine visuocortical regions is at the level of VISp versus secondary areas.". The use of the word "meaningful" is vague and this conclusion is not well justified because it is possible that subregions serve different functional roles without having different spiking statistics.

      Precisely! It is well established that different subregions serve different functional purposes - but they do not necessitate different regional embeddings. It is important to note the difference between stimulus encoding and the embedding that we are describing. As a rough analogy, the regional embedding might be considered a language, while the stimulus is the content of the spoken words. However, to avoid vague words, we revised the sentence to “These results suggest that the computational differentiability of murine visuocortical regions is at the level of VISp versus secondary areas.” (Line 380 - 381)

      (5) Figure 3D left/right halves look similar. A measure of the effect size needs to accompany these p-values.

      We assume the reviewer is referring to Figure 3E. Although some of the violin plots in Figure 3E look similar, they are not identical. In the revision, we include effect sizes in the caption.

      (6) Figure 3A, 3F: Could uncertainty estimates be provided?

      Yes. We added uncertainty estimates to the text (Line 272 - 294) and to the caption of Figure S2, which displays confusion matrices corresponding to Figure 3A. The inclusion of similar estimates for 3F would be so unwieldy as to be a disservice to the reader—there are 240 unique combinations of stimulus parameters and structures. In the context of the larger figure, 3F serves to illustrate a relationship between stimulus, region, and the anatomical embedding.

      (7) Page 21. "semi-orthogonal". Please reword or explain if this usage is technical.

      We replaced “semi-orthogonal” with “dissociable” (Line 549).

      (8) Page 11, "This approach tested whether..."] Unclear sentence. Please reword.

      We changed “This approach tested whether the MLP’s performance depended on viewing the entire ISI distribution or was enriched in a subset of patterns” to “This approach identified regions of the ISI distribution informative for classification” (Line 261).

      Reviewer #2 (Recommendations for the authors):

      We appreciate the reviewer’s comments and summary of the results. We agree that the introductory results (Figs. 1-3) are not particularly compelling when considered in isolation. They provide a baseline of comparison for the subsequent results. Our intention was to approach the problem systematically, progressing from well-established, basic methods to more advanced approaches. This allows us to clearly test a baseline and avoid analytical leaps or untested assumptions. Specifically:

      ● Figure 1 provides an evaluation of the standard dimensionality reduction methods. As expected, these methods yield minimal results, serving as a clear baseline. This is consistent, for example, with an understanding of single units as rate-varying Poisson processes.

      ● Figures 2 and 3 then build upon these results with spiking features frequent in neuroscience literature such as firing rate, coefficient of variation, etc using linear supervised and more detailed spiking features such as ISI distribution using nonlinear supervised machine learning methods.

      By starting from the standpoint of the status quo, we are better able to contextualize the significance of our later findings in Figures 4–6.

      Response to Specific Points in the Summary

      (6) Separability of VISp vs. Secondary Visual Areas

      I found the entire argument about visual areas somewhat messy and unclear. The stimuli used might not drive the secondary visual areas particularly well and might necessitate task engagement.

      We appreciate your feedback that the dissection of visual cortical structures is unclear. To summarize, as shown in the bottom three rows of Figure 6, there is a notable lack of diagonality in visuocortical structures. This means that our model was unable to learn signatures to reliably predict these classes. In contrast, visuocortical layer is returned well above chance, and superstructures (primary and secondary areas) are moderately well identified, albeit still well above chance.

      Consider a thought experiment, if Charlie Gross had not shown faces to monkeys to find IT, or Newsome and others shown motion to find MT and Zeki and others color stimuli to find V4, we would conclude that there are no differences.

      The thought experiment is misleading. The results specifically do not arise from stimulus selectivity—much of Newsome’s own work suggests that the selectivity of neurons in IT etc. is explained by little more than rate varying Poisson processes. In this case, there should be no fundamental anatomical difference in the “language” of the neurons in V4 and IT, only a difference in the inputs driving those neurons. In contrast, our work suggests that the “language” of neurons varies as a function of some anatomical divisions. In other words, in contrast to a Poisson rate code, our results predict that single neuron spike patterns might be remarkably different in MT and IT— and that this is not a function of stimulus selectivity. Notably, the anatomical (and functional) division between V1 and secondary visual areas does not appear to manifest in a different “language”, thus constituting an interesting result in and of itself.

      We regret a failure to communicate this in a tight and compelling fashion on the first submission, but hope that the revision is limpid and accessible.

      Barberini, C. L., Horwitz, G. D., & Newsome, W. T. (2001). A comparison of spiking statistics in motion sensing neurones of flies and monkeys. Motion Vision: Computational, Neural, and Ecological Constraints, 307-320.

      Bair, W., Zohary, E., & Newsome, W. T. (2001). Correlated firing in macaque visual area MT: time scales and relationship to behavior. Journal of Neuroscience, 21(5), 1676-1697.

      Similarly, why would drifting gratings be a good example of a stimulus for the hippocampus, an area thought to be involved in memory/place fields?

      The results suggest that anatomical “language” is not tied to stimuli. It is imperative to recall that neurons are highly active absent experimentally imposed stimuli, such as when an animal is at rest, when an animal is asleep, and when an animal is in the dark (relevant to visual cortices). With this in mind, also recall that, despite the lack of stimuli tailored to the hippocampus, neurons therein were still reliably separable from neurons in seven nuclei in the thalamus, 6 of which are not classically considered visual regions. Should these regions (including hippocampus) have been inert during the presentation of visual stimuli, there would have been very little separability.

      (7) Generalization across laboratories

      “[C]omparison across laboratories was somewhat underwhelming. It does okay but none of the results are particularly compelling in terms of performance.

      Any result above chance is a rejection of the null hypothesis: that a model trained on a set of animals in Laboratory A will be ineffective in identifying brain regions when tested on recordings collected in Laboratory B (in different animals and under different experimental conditions). As an existence proof, the results suggest conserved principles (however modest) that constrain neuronal activity as a function of anatomy. That models fail to achieve high accuracy (in this context) is not surprising (given the limitations of available recordings)---that models achieve anything above chance, however, is.

      Thus, after reading the paper many times, I think part of the problem is that the study is not cohesive, and the authors need to either come up with a tool or demonstrate a scientific finding.

      We demonstrate that neuronal spike trains carry robust anatomical information. We developed an ML architecture for this and that architecture is publicly available.

      They try to split the middle and I am left somewhat perplexed about what exact scientific problem they or other researchers are solving.

      We humbly suggest that the question of a neurons “language” is highly important and central to an understanding of how brains work. From a computational perspective, there is no reason for a vast diversity of cell types, nor a differentiation of the rules that dictate neuronal activity in one region versus another. A Turing Complete system can be trivially constructed from a small number of simple components, such as an excitatory and inhibitory cell type. This is the basis of many machine learning tools.

      Please do not confuse stimulus specificity with the concept of a neuron’s language. Neurons in VISp might fire more in response to light, while those in auditory cortex respond to sound. This does not mean that these neurons are different - only that their inputs are. Given the lack of a literature describing our main effect—that single neuron spiking carries information about anatomical location—it is difficult to conclude that our results are either commonplace or to be expected.

      I am also unsure why the authors think some of these results are particularly important.

      See above.

      For instance, has anyone ever argued that brain areas do not have different spike patterns?

      Yes. In effect, by two avenues. The first is a lack of any argument otherwise (please do not conflate spike patterns with stimulus tuning), and the second is the preponderance of, e.g., rate codes across many functionally distinct regions and circuits.

      Is that not the premise for all systems neuroscience?

      No. The premise for all systems neuroscience (from our perspective) is that the brain is a) a collection of interacting neurons and b) the collective system of neurons gives rise to behavior, cognition, sensation, and perception. As stated above, these axiomatic first principles fundamentally do not require that neurons, as individual entities, obey different rules in different parts of the brain.

      I could see how one could argue no one has said ISIs matter but the premise that the areas are different is a fundamental part of neuroscience.

      Based on logic and the literature, we fundamentally disagree. Consider: while systems neuroscience operates on the principle that brain regions have specialized functions, there is no a priori reason to assume that these functions must be reflected in different underlying computational rules. The simplest explanation is that a single language of spiking exists across regions, with functional differences arising from processing distinct inputs rather than fundamentally different spiking rules. For example, an identical spike train in the amygdala and Layer 5 of M1 would have profoundly different functional impacts, yet the spike timing itself could be identical (even as stimulus response). Until now, evidence for region-specific spiking patterns has been lacking, and our work attempts to begin addressing this gap. There is extensive further work to be conducted in this space, and it is certain that models will improve, rules will be clarified, and mechanisms will be identified.

      Detailed major comments

      (1) Exploratory trends in spiking by region and structure across the population:

      The argument in this section is that unsupervised analyses might reveal subtle trends in the organization of spiking patterns by area. The authors show 4 plots from t-SNE and claim to see subtle organization. I have concerns. For Figure 1C, it is nearly impossible to see if a significant structure exists that differentiates regions and structures. So this leads certain readers to conclude that the authors are looking at the artifactual structure (see Chari et al. 2024) - likely to contribute to large Twitter battles. Contributing to this issue is that the hyperparameter for tSNE was incorrectly chosen. I do think that a different perplexity should be used for the visualization in order to better show the underlying structure; the current visualization just looks like a single "blob". The UMAP visualizations in the supplement make this point more clearly. I also think the authors should include a better plot with appropriate perplexity or not include this at all. The color map of subtle shades of green and yellow is hard to see as well in both Figure S1 and Figure 1.

      In response to the feedback, we replaced t-SNE/UMAP with LDA, while keeping PCA for dimensionality reduction.

      As stated in the original methods, t-SNE/UMAP hyperparameters were chosen based on the combination that led to the greatest classifiable separability of the regions/structures in the space (across a broad range of possible combinations). It just so happens that the maximally separable structure from a regions/structures perspective is the “blob”. This suggests that perhaps the predominant structure the t-SNE finds in the data is not driven by anatomy. If we selected hyperparameters in some other way that was not based specifically on regions/structures (e.g. simple visual inspection of the plots) the conformation would of course be different and not blob-like. However, we removed the t-SNE and UMAP to avoid further confusion.

      The “muddy appearance” is not an issue with the color map. As seen in Figure 1B, the chosen colors are visibly distinct. Figure 1C (previous version) appeared muddy yellow/green because of points that overlap with transparency, resulting in a mix of clearly defined classes (e.g., a yellow point on top of a blue point creating green). This overlap is a meaningful representation of the separability observed in this analysis. We also tried using 2D KDE for visualization, but it did not improve the impression of visual separability.

      We are removing p-values from the figures because they lead to the impression that we over-interpret these results quantitatively. However, we calculated p-values based on label permutation similar to the way R2 suggests (see previous methods). The conflation with the Wasserstein distances is an understandable misunderstanding. These are unrelated to p-values and used for the heatmaps in S1 only (see previous methods).

      Instead of p-values, we now use the adjusted rand index, which measures how accurately neurons within the same region are clustered together (see Line 670 - 671, Figure 1C, and Figure S1) (Hubert & Arabie 1985). This quantifies the extent to which the distribution of points in dimensionally-reduced space is shaped by region/structure.

      Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218. https://doi.org/10.1007/BF01908075

      (2) Logistic classifiers:

      The results in this section are somewhat underwhelming. Accuracy is around 40% and yes above chance but I would be very surprised if someone is worried about separating visual structures from the thalamus. Such coarse brain targeting is not difficult. If the authors want to include this data, I recommend they show it as a control in the ISI distribution section. The entire argument here is that perhaps one should not use derived metrics and a nonlinear classifier on more data is better, which is essentially the thrust of the next section.

      As outlined above, our work systematically increases in model complexity. The logistic result is an intermediate model, and it returns intermediate results. This is an important stepping stone between the lack of a result based on unsupervised linear dimensionality reduction and the performance of supervised nonlinear models.

      From a purely utilitarian perspective, the argument could be framed as “one should not use derived metrics, and a nonlinear classifier on more data is better.” However, please see all of our notes above.

      (3) MLP classifiers:

      Even in this section, I was left somewhat underwhelmed that a nonlinear classifier with large amounts of data outperforms a linear classifier with small amounts of data. I found the analysis of the ISIs and which timescales are driving the classifier interesting but I think the classifier with smoothing is more interesting. So with a modest chance level decodability of different brain areas in the visual system, I found it somewhat grandiose to claim a "conserved" code for anatomy in the brain. If there is conservation, it seems to be at the level of the coarse brain organization, which in my opinion is not particularly compelling.

      The sample size used for both the linear and nonlinear classifiers is the same; however, the nonlinear classifier leverages the detailed spiking time information from ISIs. Our goal here was to systematically evaluate how classical spike metrics compare to more detailed temporal features in their ability to decode brain areas. We chose a linear classifier for spike metrics because, with fewer features, nonlinear methods like neural networks often offer very modest advantages over linear methods, less interpretability, and are prone to overfitting.

      Respectfully, we stand by our word choice. The term “conserved” is appropriate given that our results hold appreciably, i.e., statistically above chance, across animals.

      (4) Generalization section:

      The authors suggest that a classifier learned from one set of data could be used for new data. I was unsure if this was a scientific point or the fact that they could use it as a tool.

      It can be both. We are more driven by the scientific implications of a rejection of the null.

      Is the scientific argument that ISIs are similar across areas even in different tasks?

      It appears so - despite heterogeneity in the tuning of single neurons, their presynaptic inputs, and stimuli, there is identifiable information about anatomical location in the spike train.

      Why would one not learn a classifier from every piece of available data: like LFP bands, ISI distributions, and average firing rates, and use that to predict the brain area as a comparison?

      Because this would obfuscate the ability to conclude that spike trains embed information about anatomy.

      Considering all features simultaneously and adding additional data modalities—such as LFP bands and spike waveforms—has potential to improve classification accuracy at the cost of understanding the contribution of each feature. The spike train as a time series is the most fundamental component of neuronal communication. As a result, this is the only feature of neuronal activity of concern for the present investigation.

      Or is the argument that the ISIs are a conserved code for anatomy? Unfortunately, even in this section, the data are underwhelming.

      We appreciate the reviewer’s comments, but arrive at a very different conclusion. We were quite surprised to find any generalizability whatsoever.

      Moreover, for use as a tool, I think the authors need to seriously consider a control that is either waveforms from different brain areas or the local field potentials. Without that, I am struggling to understand how good this tool is. The authors said "because information transmission in the brain arises primarily from the timing of spiking and not waveforms (etc)., our studies involve only the timestamps of individual spikes from well-isolated units ". However, we are not talking about information transmission and actually trying to identify and assess brain areas from electrophysiological data.

      While we are not blind to the “tool” potential that is suggested by our work, this is not the primary motivation or content in any section of the paper. As stated clearly in the abstract, our motivation is to ask “whether individual neurons [...] embed information about their own anatomical location within their spike patterns”. We go on to say “This discovery provides new insights into the relationship between brain structure and function, with broad implications for neurodevelopment, multimodal integration, and the interpretation of large-scale neuronal recordings. Immediately, it has potential as a strategy for in-vivo electrode localization.” Crucially, the last point we make is a nod to application. Indeed, our results suggest that in-vivo electrode localization protocols may benefit from the incorporation of such a model.

      In light of the reviewer’s concerns, we have further dampened the weight of statements about our model as a consumer-ready tool.

      Example 1: The final sentence of the abstract now reads: “Computational approximations of anatomy have potential to support in-vivo electrode localization.”

      Example 2: The results sections now contains the following text: “While significantly above chance, the structure-level model still lacks the accuracy for immediate practical application. However, it is highly likely that the incorporation of datasets with diverse multi-modal features and alternative regions from other research groups will increase the accuracy of such a model. In addition, a computational approach can be combined with other methods of anatomical reconstruction.” (Line 355 - 359).

      Example 3: We replaced the phrase "because information transmission in the brain arises primarily from the timing of spiking and not waveforms (etc) " with the phrase “because information is primarily encoded by the firing rate or the timing of spiking and not waveforms (etc)” (Line 116 - 118).

      (5) Discussion section:

      In the discussion, beginning with "It is reasonable to consider . . ." all the way to the penultimate paragraph, I found the argumentation here extremely hard to follow. Furthermore, the parts of the discussion here I did feel I understood, I heavily disagreed with. They state that "recordings are random in their local sampling" which is almost certainly untrue when it comes to electrophysiology which tends to oversample task-modulated excitatory neurons (https://elifesciences.org/articles/69068). I also disagree that "each neuron's connectivity is unique, and vertebrate brains lack 'identified neurons' characteristic of simple organisms. While brains are only eutelic and "nameable" in only the simplest organisms (C. elegans), cell types are exceedingly stereotyped in their connectivity even in mammals and such connectivity defines their computational properties. Thus I don't find the premise the authors state in the next sentence to be undermined ("it seems unlikely that a single neuron's happenstance imprinting of its unique connectivity should generalize across stimuli and animals"). Overall, I found this subsection to rely on false premises and in my opinion it should be removed.

      At the suggestion of R2, we removed the paragraph in question. However, we would like to address some points of disagreement:

      We agree that electrophysiology, along with spike-sorting, quality metrics, and filtering of low-firing neurons, leads to oversampling of task-modulated neurons. However, when we stated that recordings are random in their local sampling, we were referring to structural (anatomical) randomness, not functional randomness. In other words, the recorded neurons were not specifically targeted (see below).

      Electrode arrays, such as Neuropixels, record from hundreds of neurons within a small volume relative to the total number of neurons and the volume of a given brain region. For instance, the paper R2 referenced includes a statement supporting this: “... assuming a 50-μm ‘listening radius’ for the probes (radius of half-cylinder around the probe where the neurons’ spike amplitude is sufficiently above noise to trigger detection) …, the average yield of 116 regular-spiking units/probe (prior to QC filtering) would imply a density of 42,000 neurons/mm³, much lower than the known density of ~90,000 neurons/mm³ for excitatory cells in mouse visual cortex….”

      If we take the estimated volume of V1 to be approximately 3 mm³, this region could theoretically be subdivided into multiple cylinders with a 100-μm diameter. While stereotaxic implantation of the probe mitigates some variability, the natural anatomical variability across individual animals introduces spatially random sampling. This was the randomness we were referring to, and thus, we disagree with the assertion that our claim is “almost certainly untrue.”

      Additionally, each cortical pyramidal neuron is understood to have ~ 10,000 presynaptic partners. It is highly unlikely that these connections are entirely pre-specified, perfectly replicated within the same animal, and identical across all members of species. Further, there is enormous diversity in the activity properties of even neighboring cells of the same type. Consider pyramidal neurons in V1. Single neuron firing rates are log normally distributed, there are many of combinations of tuning properties (i.e., direction, orientation) that must occupy each point in retinotopic space, and there is powerful experience dependent change in the connectivity of these cells. We suggest that it is inconceivable that any two neurons, even within a small region of V1, have identical connectivity.

      Minor Comments:

      (1) Although the description of confusion matrices is good from a didactic perspective, some of this could be moved to methods to simplify the paper.

      We thank the reviewer for the suggestion. However, given the broad readership of eLife, we gently suggest that confusion matrices are not a trivial and universally appreciated plotting format. For the purpose of accessibility, a brief and didactic 2-sentence description will make the paper far more comprehensible to many readers at little cost to experts.

      (2) Figure 3A: It is concluded in their subsequent figure that the longer the measured amount of time, the better the decoding performance. Thus it makes sense why the average PSTHs do not show significant decoding of areas or structures

      That is a good observation. However, all features were calculated from the same duration of data, except in Figure 3B, where we tested the effect of duration. The averaged PSTH was calculated from the same length of data as the ISI distribution and binned to have the same number of feature lengths as the ISI distribution (refer to Methods section). Therefore, we interpreted this as an indication of information degradation through averaging, rather than an effect of data length (Line 234 - 237).

      (3) Figure 3D: A Gaussian is used to fit the ISI distributions here but ISI distributions do not follow a normal distribution, they follow an inverse gamma distribution.

      We agree with the reviewer and we are familiar with the literature that the ISI distribution is best fitted by a gamma family distribution (as a recent, but not earliest example: Li et al. 2018). However, we did not fit a gaussian (or any distribution) to the data, we just calculated the sample mean and variance. Reporting sample mean and variance (or standard deviation) is not something that is only done for Gaussian distributions. They are broadly used metrics that simply have additional intrinsic meaning for Gaussian distributions. We used the schematic illustration in Fig 3D because mean and variance are much more familiar in Gaussian distribution context, but ultimately that does not affect our analyses in Fig 3 E-F. Alternatively, the alpha and beta intrinsic parameters of a gamma distribution could have been used, but they are known by a much smaller portion of neuroscientists.

      Li, M., Xie, K., Kuang, H., Liu, J., Wang, D., Fox, G. E., ... & Tsien, J. Z. (2018). Spike-timing pattern operates as gamma-distribution across cell types, regions and animal species and is essential for naturally-occurring cognitive states. Biorxiv, 145813(10.1101), 145813.

      (4) Figure 3G: Something is wrong with this figure as each vertical bar is supposed to represent a drifting grating onset but yet, they are all at 5 hz despite the PSTH being purportedly shown at many different frequencies from 1 to 15 hz.

      We appreciate your attention to detail, but we are not representing the onset of individual drifting gratings in this. We just meant to represent the overall start\end of the drifting grating session. We did not intend to signal the temporal frequency of the drifting gratings (or the spatial frequency, orientation, or contrast).

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      The mechanism, as I understand it, is different from what the authors described before in the RNN with tonic gain changes. As uncertainty increases, the network enters a regime in which the two excitatory populations start to oscillate. My intuition is that this oscillation arises from the feedback loop created by the new gain control mechanism. If my intuition is correct, I think it would be worth to explain this mechanism in the paper more explicitly.

      While interesting, this intuition is not correct. The oscillations are generated by the interaction between excitatory and inhibitory nodes in the network and occur in the model even with stationary gain. All of the plots in figure 3 exploring the dynamical regime of the network at different input x gain combinations (i.e., where the oscillatory regime is characterised) are simulations run with stationary gain.

      To ensure that this intuition is more clearly presented in the manuscript, we have edited the description in the text.

      P. 12: “Because of the large size of the network, we could not solve for the fixed points or study their stability analytically. Instead, we opted for a numerical approach and characterised the dynamical regime (i.e. the location and existence of approximate fixed-point attractors) across all combinations of (static) gain and  visited by the network.”

      Reviewer #2 (Public review):

      - The demonstration of the causal role of gain modulation in perceptual switches is partial. This causality is clearly demonstrated in the simulation work with the RNN. However, it is not fully demonstrated in the pupil analysis and the fMRI analysis. One reason is that this work is correlative (which is already very informative). An analysis of the timing of the effect might have overcome this limitation. For example, in a previous study, the same group showed that fMRI activity in the LC region precedes changes in the energy landscape of fMRI dynamics, which is a step towards investigating causal links between gain modulation, changes in the energy landscape and perceptual switches.

      Thank you for the suggestion, which we considered in detail. Unfortunately, the  temporal and spatial resolution of the fMRI data collected for this study precluded the same analyses we’ve run in previous work, however this is an important question for future work.

      - Some effects may reflect the expectation of a perceptual switch rather than the perceptual switch itself. To mitigate this risk, the design of the fMRI task included catch trials, in which no switch occurs, to reduce the expectation of a switch. The pupil study, however, did not include such catch trials.

      We agree that this is a limitation of the current study, which we previously highlighted in the methods section.

      - The paper uses RNN-based modelling to provide mechanistic insight into the role of gain modulation in perceptual switches. However, the RNN solves a task that differs markedly from that performed by human participants, which may limit the explanatory value of the model. The RNN is provided with two inputs characterising the sensory evidence supporting the first and last image category in the sequence (e.g. plane and shark). In contrast, observers in the task were naïve as to the identity of the last image at the beginning of the sequence. The brain first receives sensory evidence about the image category (e.g. plane) with which the sequence begins, which is very easy to recognise, then it sees a sequence of morphed images and has to discover what the final image category will be. To discover the final image category, the brain has to search a vast space of possible second images (it is a shark?, a frog?, a bird?, etc.), rather than comparing the likelihood of just two categories. This search process and the perceptual switch in the task appear to be mechanistically different from the competition between two inputs in the RNN.

      We appreciate the critical analysis of the experimental paradigm but disagree with the reviewers conclusions for two keys reasons: 1) Participants prior exposure to the images, such that they could create an expectation about what stimulus category a particular image would transition into (i.e., the image could not switch into any possible category); and 2) even if the reviewers’ concern was founded, models of K winner-take-all decision making are structured identically irrespective of whether the options are 2 or K options all that changes is the simulated reaction times which depend linearly on the K (for an example model see Hugh Wilson’s textbook Spikes, Decisions, and Actions, 1999, p.89-91). For these reasons, we maintain that the RNN is a sensible representation of the behavioural task.

      - Another aspect of the motivation for the RNN model remains unclear. The authors introduce dynamic gain modulation in the RNN, but it is not clear what the added value of dynamic gain modulation is. Both static (Fig. S1) and dynamic (Fig. 2F) gain modulation lead to the predicted effect: faster switching when the gain is larger.

      While we agree that the effect is observable with both static and dynamic gain, the stronger construct validity associated with the dynamic approach, including a stronger link with the observed pupil dynamics and a rich literature associated with modelling the behavioural consequences of surprise/uncertainty led us to the conclusion that the dynamical approach was a better representation of our hypothesis.

      - Fig 1C: I don't see a "top grey bar" indicating significance.

      Thank you for catching this, the caption has been amended. The text was from an older version of the manuscript.

      - p. 10, reference to fig 3F seems incorrect: there is Fig 3F upper and Fig 3F lower, and nothing on Fig 3 and its legend mention the lesion of units

      This has been amended. We meant to refer to 2F.

      - In the response letter you mention a MATLAB tutorial, but I could not find it.

      This has been amended. Github repository can be found at https://github.com/ShineLabUSYD/AmbiguousFigures

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript reports that expression of the E. coli operon topAI/yjhQ/yjhP is controlled by the translation status of a small open reading frame, that authors have discovered and named toiL, located in the leader region upstream of the operon. Authors propose the following model for topAI activation: Under normal conditions, toiL is translated but topAI is not expressed because of Rho-dependent transcription termination within the topAI ORF and because its ribosome binding site and start codon are trapped in an mRNA hairpin. Ribosome stalling at various codons of the toiL ORF, prompted in this work by some ribosome-targeting antibiotics, triggers an mRNA conformational switch which allows translation of topAI and, in addition, activation of the operon's transcription because presence of translating ribosomes at the topAI ORF blocks Rho from terminating transcription. The model is appealing and several of the experimental data mainly support it. However, it remains unanswered what is the true trigger of the translation arrest at toiL and what is the physiological role of the induced expression of the topAI/yjhQ/yjhP operon.

      Reviewer #2 (Public review):

      Summary:

      Baniulyte and Wade describe how translation of an 8-codon uORF denoted toiL upstream of the topAI-yjhQP operon is responsive to different ribosome-targeting antibiotics, consequently controlling translation of the TopAI toxin as well as Rho-dependent termination with the gene.

      Strengths:

      The authors used multiple different approaches such as a genetic screen to identify factors such as 23S rRNA mutations that affect topA1 expression and ribosome profiling to examine the consequences of various antibiotics on toiL-mediated regulation.

      Weaknesses:

      Future experiments will be needed to better understand the physiological role of the toiL-mediated regulation and elucidate the mechanism of specific antibiotic sensing.

      The results are clearly described, and the revisions have helped to improve the presentation of the data.

      Reviewer #3 (Public review):

      In this revised manuscript, the authors provide convincing data to support an elegant model in which ribosome stalling by ToiL promotes downstream topAI translation and prevents premature Rho-dependent transcription termination. However, the physiological consequences of activating topAI-yjhQP expression upon exposure to various ribosome-targeting antibiotics remain unresolved. The authors have satisfactorily addressed all major concerns raised by the reviewers, particularly regarding the SHAPE-seq data. Overall, this study underscores the diversity of regulatory ribosome-stalling peptides in nature, highlighting ToiL's uniqueness in sensing multiple antibiotics and offering significant insights into bacterial gene regulation coordinated by transcription and translation.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      - Showing the ribosome density profiles of topAI/yjhQP and toiL in control and tetracycline treated cells is necessary to support that ribosome arrest at toiL increases translation of topAI/yjhQP.

      Figure 7B shows ribosome density around the start of toiL. Ribosome density increases across topAI in the presence of tetracycline, but we have opted not to show this region because we cannot say whether the increase in ribosome occupancy (represented in Figure 7A) is due to an increase in translation efficiency, RNA level, or both.

      - The subinhibitory antibiotic concentrations used in the reporter assays were based on MICs reported in the literature. This is not appropriate since MICs can greatly vary between strains, antibiotic solution stocks, and experimental conditions.

      Reported MICs were used as an initial guide for selecting antibiotic concentrations to test in our reporter assays. We have added text to indicate this, and to highlight that MICs vary considerably between strains.

      - toiL sequence may have evolved to maintain base-pairing with the topAI upstream region rather than, as authors suggest in Discussion, to respond to antibiotic-mediated arrest in an amino acid sequence specific manner.

      We have chosen to frame this as speculation.

      - Authors may consider commenting on the possibility that chloramphenicol does not induce because ToiL lacks alanine residues, whose presence at specific places of a nascent protein have been shown to promote chloramphenicol action (2016 PNAS 113:12150; 2022 NSMB 29:152).

      This is a great point as none of our stalling reporters included an ORF with alanine. We now include a short paragraph in the Discussion section to raise this possibility.

      - Tetracycline was added at the "subinhibitory concentration" of 8 ug/mL for the reporter assays but at 1 ug/mL for the ribosome profiling experiments. Authors should explain what was the rational for this.

      We think the reviewer is mixing up the epidemiological cut-off value of 8 ug/mL with the concentration used in experiments (0.5-1 ug/mL for reporter assays and ribosome profiling). The text was confusing, so we have added a sentence to the Methods section to indicate that epidemiological cut-off values and MICs were only a guide for selecting antibiotic concentrations to test.

      Reviewer #2 (Recommendations for the authors):

      I wish the authors had been slightly less dismissive of the reviewers' comments. At a minimum, it would be nice if the authors could be consistent about the ribosome representation throughout the manuscript;

      We apologize if our previous responses gave the impression of being dismissive. That was certainly not our intention. We greatly value the reviewers' feedback, and we appreciate the opportunity to clarify any misunderstandings. We believe the reviewer is referring to the different shape and color of the ribosome in Figures 8 and 9, and Figure 8 figure supplement 2, which we have now corrected.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #2 (Public Review):

      Comments on revisions:

      Although the authors have revealed the sulfane sulfur content in native MT-3, my question, namely, whether canonical MT-1 and MT-2 contained sulfane sulfur after the induction has been left.

      The authors argue that the biological significance of sulfane sulfur in MTs lies in its ability to contribute to metal binding affinity, provide a sensing mechanism against oxidative stress, and aid in the regulation of the protein. Due to their biological roles, induced MT-1 and MT-2 could contain sulfane sulfur in their molecules. Thus, I expect the authors to evaluate or explain the sulfane sulfur content in induced MT-1 and MT-2.

      Thank you for your valuable comments. In this study, we were not able to examine the role of sulfane sulfur in the induced forms of MT-1 and MT-2. However, this topic is undoubtedly important and intriguing; therefore, we will continue to explore it in future studies.

      Reviewer #3 (Public Review):

      Comments on revisions:

      The revised manuscript is only slightly changed from the original, with the inclusion of a supplementary figure (Fig. S2) and minor changes in the text. The authors did not choose to carry out the quantitative Zn binding experiment (which I really wanted to see), but given the complexities of the experiment, I'll let it go.

      Fig. 9: the authors imply in the mechanistic "redox-switch" figure that Trx/TR can not reduce persulfide linkages. A number of groups have shown this to be the case. I recommend modifying the figure legend or text to make this clear to the reader.

      Thank you for your understanding. Regarding the "redox-switch" figure, although some groups have demonstrated the ability of Trx to reduce persulfide moieties, as you pointed out, we have addressed this discrepancy in the Discussion section as follows (lines 357-361): “In contrast, Trx has been proposed to reduce the persulfide moiety of PTP1B (37) and albumin (38, 39). A possible explanation for this discrepancy is that apo-GIF/MT-3-persulfide is rapidly changed into a different conformation that is topologically resistant to Trx reduction. In other words, Trx may exhibit substrate specificity.” Additionally, we have inserted the following sentence just before the above discussion to further clarify this point:“This suggests that the persulfide moiety in GIF/MT-3 appears to be relatively stable against Trx reduction.”

    1. Author response:

      Reviewer 1:

      We thank the reviewer for his/her very positive comments.

      Reviewer 2:

      We thank the reviewer for his/her positive evaluation. We plan to add RNAseq data of yeast wild-type and JDP mutant strains as more direct readout for the role of Apj1 in controlling Hsf1 activity. We agree with the reviewer that our study includes one major finding: the central role of Apj1 in controlling the attenuation phase of the heat shock response. In accordance with the reviewer we consider this finding highly relevant and interesting for a broad readership. We agree that additional studies are now necessary to mechanistically dissect how the diverse JDPs support Hsp70 in controlling Hsf1 activity. We believe that such analysis should be part of an independent study but we will indicate this aspect as part of an outlook in the discussion section of a revised manuscript.

      Reviewer 3:

      We thank the reviewer for his/her suggestions. We agree that it is sometimes difficult to distinguish direct effects of JDP mutants on heat shock regulation from indirect ones, which can result from the accumulation of misfolded proteins that titrate Hsp70 capacity. We also agree that an in vitro reconstitution of Hsf1 displacement from DNA by Apj1/Hsp70 will be important, also to dissect Apj1 function mechanistically. We will add this point as outlook to the revised manuscript.

    1. Author response:

      The following is the authors’ response to the current reviews.

      eLife Assessment

      This important and creative study finds that the uplift of the Qinghai-Tibet Plateau-via its resultant monsoon system rather than solely its high elevation-has shifted avian migratory directions from a latitudinal to a longitudinal orientation. However, the main claims are incomplete and only partially supported, as the reliance on eBird data-which lacks the resolution to capture population-specific teleconnections-combined with a limited tracking dataset covering only seven species leaves key aspects of the argument underdetermined, and the critical assumption of niche conservatism is not sufficiently foregrounded in the manuscript. More clearly communicating these limitations would significantly enhance the interpretability of the results, ensuring that the major conclusions are presented in the context of these essential caveats.

      We appreciate your positive comments and constructive suggestions. We fully acknowledge your concerns about clearly communicating the limitations associated with the data used and analytical assumptions. We will try to get more satellite tracking data of birds migrating across the plateau. We will carefully consider the insights that our paper can deliver and make sure the limitations of our datasets and the critical assumption of niche conservatism are clearly presented. By explicitly clarifying these caveats, we believe the transparency and interpretability of the findings will be much improved.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors have done a good job of responding to the reviewer's comments, and the paper is now much improved.

      Again, we thank the reviewer for constructive comments during review.

      Reviewer #2 (Public review):

      I would like to thank the authors for the revision and the input they invested in this study.

      We are grateful for your thoughtful feedback and enthusiasms, which will help us improve our manuscript.

      With the revised text of the study, my earlier criticism holds, and your arguments about the counterfactual approach are irrelevant to that. The recent rise of the counterfactual approach might likely mirror the fact that there are too many scientists behind their computers, and few go into the field to collect in situ data. Studies like the one presented here are a good intellectual exercise but the real impact is questionable.

      We understand your question about the relevance of the counterfactual approach used in our study. Our intent in using a counterfactual scenario (reconstructing migration patterns assuming pre-uplift conditions on the QTP) was to isolate the potential influence of the plateau’s geological history on current migration routes. We agree that such an approach must be used properly. In the revision, we will explicitly clarify why this counterfactual comparison is useful – namely, it provides a theoretical baseline to test how much the QTP’s uplift (and the associated monsoon system) might have redirected migration paths. We acknowledge that the counterfactual results are theoretical and will explicitly emphasise the assumptions involved (e.g. species–environment relationships hold between pre- and post- lift environments) in the main text. Nonetheless, we defend the approach as a valuable study design: it helps generate testable hypotheses about migration (for instance, that the plateau’s monsoon-driven climate, rather than just its elevation, introduces an east–west shift en route). We will also tone down the language around this analysis to avoid overstating its real-world relevance. In summary, we will clarify that the counterfactual analysis is meant to complement, not replace, empirical observations, and we will discuss its limitations so that its role is appropriately bounded in the paper.

      All your main conclusions are inferred from published studies on 7! bird species. In addition, spatial sampling in those seven species was not ideal in relation to your target questions. Thus, no matter how fancy your findings look, the basic fact remains that your input data were for 7 bird species only! Your conclusion, “our study provides a novel understanding of how QTP shapes migration patterns of birds” is simply overstretching.

      Thank you for your comments. We apologise for any confusion regarding the scope of our dataset. Our main conclusions are not solely derived from seven bird species. Rather, we integrated a full list of 50 bird species that migrate across the QTP and analysed their migratory patterns with eBird data. We studied the factors influencing their choices of migratory routes with seven species that were among the few with available tracking data across the QTP. In this revision, we will clarify the role of these seven species and the rationale for their selection. Additionally, we attempt to include more satellite tracking data to improve spatial coverage, as recommended by the reviewer and editor. Based on discussions with potential collaborators, we will hopefully include a number of at least 10 more species with available tracking data.

      The way you respond to my criticism on L 81-93 is something different than what you admit in the rebuttal letter. The text of the ms is silent about the drawbacks and instead highlights your perspective. I understand you; you are trying to sell the story in a nice wrapper. In the rebuttal you state: “we assume species' responses to environments are conservative and their evolution should not discount our findings.” But I do not see that clearly stated in the main text.

      Thanks, as suggested we will clearly state the assumptions of niche conservatism in the Introduction.

      In your rebuttal, you respond to my criticism of "No matter how good the data eBird provides is, you do not know population-specific connections between wintering and breeding sites" when you responded: ... "we can track the movement of species every week, and capture the breeding and wintering areas for specific populations" I am having a feeling that you either play with words with me or do not understand that from eBird data nobody will be ever able to estimate population-specific teleconnections between breeding and wintering areas. It is simply impossible as you do not track individuals. eBird gives you a global picture per species but not for particular populations. You cannot resolve this critical drawback of your study.

      We agree that inferring population-specific migratory connections (teleconnections) from eBird data is challenging and inherently limited. eBird provides occurrence records for species, but it generally cannot distinguish which breeding population an individual bird came from or exactly where it goes for winter. However, in this study we intend to infer broad-scale movement patterns (e.g. general directions and stopover regions) rather than precise one-to-one population linkages. In the revision, we will carefully rephrase those sections to make clear that our inferences are at the species level and at large spatial scales. We will also explicitly state in the Discussion that confirming population connectivity would require targeted tracking or genetic studies, and that our eBird-based analysis can only suggest plausible routes and region-to-region linkages. We will contrast migratory routes identified by using eBird data and satellite tracking for the same species to check their similarity. We argue that, even with its limits, the eBird dataset can still yield useful insights (such as identifying major flyway corridors over the QTP).

      I am sorry that you invested so much energy into this study, but I see it as a very limited contribution to understanding the role of a major barrier in shaping migration.

      Thank you for recognising our efforts in the study. By integrating both satellite tracking and community-contributed data, we explored how the uplift of the QTP could shape avian migration across the area. We believe our findings provide important insights of how birds balance their responses to large-scale climate change and geological barrier, which yields the most comprehensive picture to date of how the QTP uplift shapes migratory patterns of birds. We will also acknowledge the study’s limitations to ensure that readers understand the context and constraints of our findings.

      My modest suggestion for you is: go into the field. Ideally use bird radars along the plateau to document whether the birds shift the directions when facing the barrier.

      We appreciate your suggestions to incorporate field tracking or radar studies to strengthen our results. All coauthors have years of field experiences, even on the QTP and Arctic. For example, the tracking data of peregrine falcons (Falco peregrinus) that we will incorporate in the revision are collected with during our own fieldwork in the Arctic for more than six years. We agree that more direct tracking (through GPS tagging or radar) would be an ideal way to validate migration pathways and population connectivity. In this revision, as stated above we will try to more species with satellite tracking data. We will also note that future studies should build on our findings by using dedicated tracking of more individual birds and radar monitoring of migration over the QTP. We will cite recent advances in these techniques and suggest that incorporating more tracking data could further test the hypotheses generated by our analyses.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      L55 "an important animal movement behaviour is.." Is there any unimportant animal movement? I mean this sentence is floppy, empty.

      We will rewrite this sentence to remove any ambiguous phrasing.

      L 152-154 This sentence is full of nonsense or you misinterpretation. First of all, the issue of inflexible initiation of migration was related to long-distance migrants only! The way you present it mixes apples and oranges (long- and short-distance migrants). It is not "owing to insufficient responses" but due to inherited patterns of when to take off, photoperiod and local conditions.

      We will remove the sentence to avoid misinterpretation.

      L 158 what is a migration circle? I do not know such a term.

      We will amend it as “annual migration cycle”, which is a more common way to describe the yearly round-trip journey between breeding and wintering grounds of birds.

      L 193 The way you present and mix capital and income breeding theory with your simulation study is quite tricky and super speculative.

      We will present this idea as an inference rather than a conclusion: “This pattern could be consistent with a ‘capital breeding’ strategy — where birds rely on energy reserves acquired before breeding — rather than an ‘income’ strategy that depends on food acquired during breeding. However, we note that this interpretation would require further study.” By adding this caution, we will make it clear that we are not asserting this link as proven fact, only suggesting it as one possible explanation. We will also double-check that the rest of the discussion around this point is framed appropriately.


      The following is the authors’ response to the previous reviews

      eLife Assessment

      This study addresses a novel and interesting question about how the rise of the Qinghai-Tibet Plateau influenced patterns of bird migration, employing a multi-faceted approach that combines species distribution data with environmental modeling. The findings are valuable for understanding avian migration within a subfield, but the strength of evidence is incomplete due to critical methodological assumptions about historical species-environment correlations, limited tracking data, and insufficient clarity in species selection criteria. Addressing these weaknesses would significantly enhance the reliability and interpretability of the results.

      We would like to thank you and two anonymous reviewers for your careful, thoughtful, and constructive feedback on our manuscript. These reviews made us revisit a lot of our assumptions and we believe the paper is much improved as a result. In addition to minor points, we have made three main changes to our manuscript in response to the reviews. First, we addressed the concerns on the assumptions of historical species-environment correlations from perspectives of both theoretical and empirical evidence. Second, we discussed the benefits and limitations of using tracking data in our study and demonstrate how the findings of our study are consolidated with results of previous studies. Third, we clarified our criteria for selecting species in terms of both eBird and tracking data.

      Below, we respond to each comment in turn. Once again, we thank you all for your feedback.

      Public Reviews:

      Reviewer #1 (Public review):

      Strengths:

      This is an interesting topic and a novel theme. The visualisations and presentation are to a very high standard. The Introduction is very well-written and introduces the main concepts well, with a clear logical structure and good use of the literature. The methods are detailed and well described and written in such a fashion that they are transparent and repeatable.

      We are appreciative of the reviewer’s careful reading of our manuscript, encouraging comments and constructive suggestions.

      Weaknesses:

      I only have one major issue, which is possibly a product of the structure requirements of the paper/journal. This relates to the Results and Discussion, line 91 onwards. I understand the structure of the paper necessitates delving immediately into the results, but it is quite hard to follow due to a lack of background information. In comparison to the Methods, which are incredibly detailed, the Results in the main section reads as quite superficial. They provide broad overviews of broad findings but I found it very hard to actually get a picture of the main results in its current form. For example, how the different species factor in, etc.

      Yes, it is the journal request to format in this way (Methods follows the Results and Discussion) for the article type of short reports. As suggested, in the revision we have elaborated on details of our findings, in terms of (i) shifts of distribution of avian breeding and wintering areas under the influence of the uplift of the Qinghai-Tibet Plateau (Lines 102-116), and (ii) major factors that shape current migration patterns of birds in the plateau (Lines 118-138). We have also better referenced the approaches we used in the study.

      Reviewer #2 (Public review):

      Summary:

      The study tries to assess how the rise of the Qinghai-Tibet Plateau affected patterns of bird migration between their breeding and wintering sites. They do so by correlating the present distribution of the species with a set of environmental variables. The data on species distributions come from eBird. The main issue lies in the problematic assumption that species correlations between their current distribution and environment were about the same before the rise of the Plateau. There is no ground truthing and the study relies on Movebank data of only 7 species which are not even listed in the study. Similarly, the study does not outline the boundaries of breeding sites NE of the Plateau. Thus it is absolutely unclear potentially which breeding populations it covers.

      We are very grateful for the careful review and helpful suggestions. We have revised the manuscript carefully in response to the reviewer’s comments and believe that it is much improved as a result. Below are our point-by-point replies to the comments.

      Strengths:

      I like the approach for how you combined various environmental datasets for the modelling part.

      We appreciate the reviewer’s encouragement.

      Weaknesses:

      The major weakness of the study lies in the assumption that species correlations between their current distribution and environments found today are back-projected to the far past before the rise of the Q-T Plateau. This would mean that species responses to the environmental cues do not evolve which is clearly not true. Thus, your study is a very nice intellectual exercise of too many ifs.

      This is a valid concern. We have addressed this from both the perspectives of the theoretical design of our study and empirical evidence.

      First, we agree with the reviewer that species responses to environmental cues might vary over time. Nonetheless, the simulated environments before the uplift of the plateau serve as a counterfactual state in our study. Counterfactual is an important concept to support causation claims by comparing what happened to what would have happened in a hypothetical situation: “If event X had not occurred, event Y would not have occurred” (Lewis 1973). Recent years have seen an increasing application of the counterfactual approach to detect biodiversity change, i.e., comparing diversity between the counterfactual state and real estimates to attribute the factors causing such changes (e.g., Gonzalez et al. 2023). Whilst we do not aim to provide causal inferences for avian distributional change, using the counterfactual approach, we are able to estimate the influence of the plateau uplift by detecting the changes of avian distributions, i.e., by comparing where the birds would have distributed without the plateau to where they currently distributed. We regard the counterfactual environments as a powerful tool for eliminating, to the extent possible, vagueness, as opposed to simply description of current distributions of birds. Therefore, we assume species’ responses to environments are conservative and their evolution should not discount our findings. We have clarified this in the Introduction (Lines 81-93).

      Second, we used species distribution modelling to contrast the distributions of birds before and after the uplift of the plateau under the assumption that species tend to keep their ancestral ecological traits over time (i.e., niche conservatism). This indicates a high probability for species to distribute in similar environments wherever suitable. Particularly, considering bird distributions are more likely to be influenced by food resources and vegetation distributions (Qu et al. 2010, Li et al. 2021, Martins et al. 2024), and the available food and vegetation before the uplift can provide suitable habitats for birds (Jia et al. 2020), we believe the findings can provide valuable insights into the influence of the plateau rise on avian migratory patterns. Having said that, we acknowledge other factors, e.g., carbon dioxide concentrations (Zhang et al. 2022), can influence the simulations of environments and our prediction of avian distribution. We have clarified the assumptions and evidence we have for the modelling in Methods (Lines 362-370).

      The second major drawback lies in the way you estimate the migratory routes of particular birds. No matter how good the data eBird provides is, you do not know population-specific connections between wintering and breeding sites. Some might overwinter in India, some populations in Africa and you will never know the teleconnections between breeding and wintering sites of particular species. The few available tracking studies (seven!) are too coarse and with limited aspects of migratory connectivity to give answer on the target questions of your study.

      We agree with the reviewer that establishing interconnections for birds is important for estimating the migration patterns of birds. We employed a dynamic model to assess their weekly distributions. Thus, we can track the movement of species every week, and capture the breeding and wintering areas for specific populations. That being said, we acknowledge that our approach can be subjected to the patchy sampling of eBird data. In contrast, tracking data can provide detailed information of the movement patterns of species but are limited to small numbers of species due to the considerable costs and time needed. We aimed to adopt the tracking data to examine the influence of focal factors on avian migration patterns, but only seven species, to the best of our ability, were acquired. Moreover, similar results were found in studies that used tracking data to estimate the distribution of breeding and wintering areas of birds in the plateau (e.g., Prosser et al. 2011, Zhang et al. 2011, Zhang et al. 2014, Liu et al. 2018, Kumar et al. 2020, Wang et al. 2020, Pu and Guo 2023, Yu et al. 2024, Zhao et al. 2024). We believe the conclusions based on seven species are rigour, but their implications could be restricted by the number of tracking species we obtained. We have better demonstrated how our findings on breeding and wintering areas of birds are reinforced by other studies reporting the locations of those areas. We have also added a separate caveat section to discuss the limitations stated above (Lines 202-215).

      Your set of species is unclear, selection criteria for the 50 species are unknown and variability in their migratory strategies is likely to affect the direction of the effects.

      In this revision, we have clarified the selection criteria for the 50 species and outlined the boundaries of the breeding areas of all birds (Lines 243-249). Briefly, we first obtained a full list of birds in the plateau from Prins and Namgail (2017). We then extracted species identified as full migrants in Birdlife International (https://datazone.birdlife.org/species/spcdistPOS) from the full list. Migratory birds may follow a capital or income migratory strategy depending on how much birds ingest endogenous reserved energy gained prior to reproduction. We have added discussions on how these migratory strategies might influence the effects of environment on migratory direction (Lines 183-200).

      In addition, the position of the breeding sites relative to the Q-T plate will affect the azimuths and resulting migratory flyways. So in fact, we have no idea what your estimates mean in Figure 2.

      We calculated the azimuths not only by the angles between breeding sites and wintering sites but also based on the angles between the stopovers of birds. Therefore, the azimuths are influenced by the relative positions of breeding, wintering and stopover sites. This would minimize the possible errors by just using breeding areas such as the biases caused by relative locations of breeding areas to the QTP as the reviewer pointed. We have better explained this both in the Introduction, Methods and legend of Figure 2.

      There is no way one can assess the performance of your statistical exercises, e.g. performances of the models.

      As suggested, we have reported Area Under the Curve (AUC) of the Receiver Operator Characteristic (ROC)assess the performances of the models (Table S1). AUC is a threshold-independent measurement for discrimination ability between presence and random points (Phillips et al. 2006). When the AUC value is higher than 0.75, the model was considered to be good (Elith et al. 2006). (Lines 379-383).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      This is an interesting topic and a novel theme. The visualisations and presentation are to a very high standard. The Introduction is very well-written and introduces the main concepts well, with a clear logical structure and good use of the literature. The Methods are detailed and well described and written in such a fashion that they are transparent and repeatable.

      I only have one major issue, which is possibly a product of the structure requirements of the paper/journal. With the Results and Discussion, line 91 onwards. I understand the structure of the paper necessitates delving immediately into the results, but it is quite hard to follow due to a lack of background information. In comparison to the Methods, which are incredibly detailed, the Results in the main section read quite superficial. They provide broad overviews of broad findings but I found it very hard to actually get a picture of the main results in its current form. For example, how the different species factor in, etc.

      Please see our responses above.

      Reviewer #2 (Recommendations for the authors):

      Methodological issues:

      Line 219 Why have you selected only 64 species and what were the selection criteria?

      We have clarified the selection criteria (Lines 243-248). Briefly, we first obtained a full list of birds in the plateau from Prins and Namgail (2017). We then extracted species identified as full migrants in Birdlife International (https://datazone.birdlife.org/species/spcdistPOS) from the full list.

      Minor:

      Line 219 eBird has very uneven distribution, especially in vast areas of Russia. How can your exercise on Lines 232-238 overcome this issue?

      Yes, eBird data can be biased due to patchy sampling and variation of observers’ skills in identifying species. To address this issue, we have developed an adaptive spatial-temporal modelling (stemflow; Chen et al. 2024) to correct the imbalance distribution of data and modelled the observer experience to address the bias in recognising species. The stemflow was developed based on a machine learning modelling framework (AdaSTEM) which leverages the spatio-temporal adjacency information of sample points to model occurrence or abundance of species at different scales. It has been frequently used in modelling eBird data (Fink et al. 2013, Johnston et al. 2015, Fink et al. 2020) and has been proven to be efficient and advanced in multi-scale spatiotemporal data modelling. We have better explained this (Lines 251-270; Lines 307-321).

      Line 54 This sentence sounds very empty and in fact does not tell us much.

      We have adjusted this sentenced to “Animal movement underpins species’ spatial distributions and ecosystem processes”.

      Line 55 Again a sentence that implies a causality of the annual cycle to make the species migrate. It does not make sense.

      We have revised this sentence as “An important animal movement behaviour is migrating between breeding and wintering grounds”.

      Line 58 How is our fascination with migratory journeys related to the present article? I think this line is empty.

      We have changed this sentence to “Those migratory journeys have intrigued a body of different approaches and indicators to describe and model migration, including migratory direction, speed, timing, distance, and staging periods”.

      Figure 1 - ABC insets are OK, but a combination of lati- and longitudinal patterns is possible, e.g. in species with conservative strategies or for whatever other reason.

      Thank you for the suggestion. We kept the ABC insets rather than combining them together as we believe this can deliver a clear structure of influence of QTP uplift under different scenarios.

      The legend to Figure 2 is not self-explanatory. Please make it clear what the response variable is and its units. The first line of the legend should read something like The influence of environmental factors on the direction of avian migration.

      Thank you. We have amended the legends of Figure 2 as suggested:

      “Figure 2. The influence of environmental factors on the direction of avian migration.  Migratory directions are calculated based on the azimuths between each adjacent stopover, breeding and wintering areas for each species. We employ multivariate linear regression models under the Bayesian framework to measure the correlation between environmental factors and avian migratory directions. Wind represents the wind cost calculated by wind connectivity. Vegetation is measured by the proportion of average vegetation cover in each pixel (~1.9° in latitude by 2.5° in longitude). Temperature is the average annual temperature. Precipitation is the average yearly precipitation. All environmental layers are obtained using the Community Earth System Model. West QTP, central QTP, and East QTP denote areas in the areas west (longitude < 73°E), central (73°E ≤ longitude < 105°E), and east of (longitude ≥ 105°E) the Qinghai-Tibet Plateau, respectively.”

      References

      Chen, Y., Z. Gu, and X. Zhan. 2024. stemflow: A Python Package for Adaptive Spatio-Temporal Exploratory Model. Journal of Open Source Software 9:6158.

      Elith, J., C. H. Graham, R. P. Anderson, M. Dudík, S. Ferrier, A. Guisan, R. J. Hijmans, F. Huettmann, J. R. Leathwick, A. Lehmann, J. Li, L. G. Lohmann, B. A. Loiselle, G. Manion, C. Moritz, M. Nakamura, Y. Nakazawa, J. McC. M. Overton, A. Townsend Peterson, S. J. Phillips, K. Richardson, R. Scachetti-Pereira, R. E. Schapire, J. Soberón, S. Williams, M. S. Wisz, and N. E. Zimmermann. 2006. Novel methods improve prediction of species' distributions from occurrence data. Ecography 29:129-151.

      Fink, D., T. Auer, A. Johnston, V. Ruiz-Gutierrez, W. M. Hochachka, and S. Kelling. 2020. Modeling avian full annual cycle distribution and population trends with citizen science data. Ecological Applications 30:e02056.

      Fink, D., T. Damoulas, and J. Dave. 2013. Adaptive Spatio-Temporal Exploratory Models: Hemisphere-wide species distributions from massively crowdsourced eBird data. Pages 1284-1290 in Proceedings of the AAAI Conference on Artificial Intelligence.

      Gonzalez, A., J. M. Chase, and M. I. O'Connor. 2023. A framework for the detection and attribution of biodiversity change. Philosophical Transactions of the Royal Society B: Biological Sciences 378.

      Jia, Y., H. Wu, S. Zhu, Q. Li, C. Zhang, Y. Yu, and A. Sun. 2020. Cenozoic aridification in Northwest China evidenced by paleovegetation evolution. Palaeogeography, Palaeoclimatology, Palaeoecology 557:109907.

      Johnston, A., D. Fink, M. D. Reynolds, W. M. Hochachka, B. L. Sullivan, N. E. Bruns, E. Hallstein, M. S. Merrifield, S. Matsumoto, and S. Kelling. 2015. Abundance models improve spatial and temporal prioritization of conservation resources. Ecological Applications 25:1749-1756.

      Kumar, N., U. Gupta, Y. V. Jhala, Q. Qureshi, A. G. Gosler, and F. Sergio. 2020. GPS-telemetry unveils the regular high-elevation crossing of the Himalayas by a migratory raptor: implications for definition of a “Central Asian Flyway”. Scientific Reports 10:15988.

      Lewis, D. 1973. Counterfactuals. Oxford: Blackwell.

      Li, S.-F., P. J. Valdes, A. Farnsworth, T. Davies-Barnard, T. Su, D. J. Lunt, R. A. Spicer, J. Liu, W.-Y.-D. Deng, J. Huang, H. Tang, A. Ridgwell, L.-L. Chen, and Z.-K. Zhou. 2021. Orographic evolution of northern Tibet shaped vegetation and plant diversity in eastern Asia. Science Advances 7:eabc7741.

      Liu, D., G. Zhang, H. Jiang, and J. Lu. 2018. Detours in long-distance migration across the Qinghai-Tibetan Plateau: individual consistency and habitat associations. PeerJ 6:e4304.

      Martins, L. P., D. B. Stouffer, P. G. Blendinger, K. Böhning-Gaese, J. M. Costa, D. M. Dehling, C. I. Donatti, C. Emer, M. Galetti, R. Heleno, Í. Menezes, J. C. Morante-Filho, M. C. Muñoz, E. L. Neuschulz, M. A. Pizo, M. Quitián, R. A. Ruggera, F. Saavedra, V. Santillán, M. Schleuning, L. P. da Silva, F. Ribeiro da Silva, J. A. Tobias, A. Traveset, M. G. R. Vollstädt, and J. M. Tylianakis. 2024. Birds optimize fruit size consumed near their geographic range limits. Science 385:331-336.

      Phillips, S. J., R. P. Anderson, and R. E. Schapire. 2006. Maximum entropy modeling of species geographic distributions. Ecological Modelling 190:231-259.

      Prins, H. H. T., and T. Namgail. 2017. Bird migration across the Himalayas : wetland functioning amidst mountains and glaciers. Cambridge University Press, Cambridge.

      Prosser, D. J., P. Cui, J. Y. Takekawa, M. Tang, Y. Hou, B. M. Collins, B. Yan, N. J. Hill, T. Li, Y. Li, F. Lei, S. Guo, Z. Xing, Y. He, Y. Zhou, D. C. Douglas, W. M. Perry, and S. H. Newman. 2011. Wild Bird Migration across the Qinghai-Tibetan Plateau: A Transmission Route for Highly Pathogenic H5N1. Plos One 6:e17622.

      Pu, Z., and Y. Guo. 2023. Autumn migration of black-necked crane (Grus nigricollis) on the Qinghai-Tibetan and Yunnan-Guizhou plateaus. Ecology and Evolution 13:e10492.

      Qu, Y., F. Lei, R. Zhang, and X. Lu. 2010. Comparative phylogeography of five avian species: implications for Pleistocene evolutionary history in the Qinghai-Tibetan plateau. Molecular Ecology 19:338-351.

      Wang, Y., C. Mi, and Y. Guo. 2020. Satellite tracking reveals a new migration route of black-necked cranes (Grus nigricollis) in Qinghai-Tibet Plateau. PeerJ 8:e9715.

      Yu, X., G. Song, H. Wang, Q. Wei, C. Jia, and F. Lei. 2024. Migratory flyways and connectivity of Brown Headed Gulls (Chroicocephalus brunnicephalus) revealed by GPS tracking. Global Ecology and Conservation 56:e03340.

      Zhang, G.-G., D.-P. Liu, Y.-Q. Hou, H.-X. Jiang, M. Dai, F.-W. Qian, J. Lu, T. Ma, L.-X. Chen, and Z. Xing. 2014. Migration routes and stopover sites of Pallas’s Gulls Larus ichthyaetus breeding at Qinghai Lake, China, determined by satellite tracking. Forktail 30:104-108.

      Zhang, G.-G., D.-P. Liu, Y.-Q. Hou, H.-X. Jiang, M. Dai, F.-W. Qian, J. Lu, Z. Xing, and F.-S. Li. 2011. Migration Routes and Stop-Over Sites Determined with Satellite Tracking of Bar-Headed Geese (Anser indicus) Breeding at Qinghai Lake, China. Waterbirds 34:112-116, 115.

      Zhang, R., D. Jiang, C. Zhang, and Z. Zhang. 2022. Distinct effects of Tibetan Plateau growth and global cooling on the eastern and central Asian climates during the Cenozoic. Global and Planetary Change 218:103969.

      Zhao, T., W. Heim, R. Nussbaumer, M. van Toor, G. Zhang, A. Andersson, J. Bäckman, Z. Liu, G. Song, M. Hellström, J. Roved, Y. Liu, S. Bensch, B. Wertheim, F. Lei, and B. Helm. 2024. Seasonal migration patterns of Siberian Rubythroat (Calliope calliope) facing the Qinghai–Tibet Plateau. Movement Ecology 12:54.

    1. Author response:

      Reviewer #1 (Public Review):

      Overall, I find only two minor weaknesses. First, the insights of this study are, first and foremost, of feed-forward nature, and a feed-forward network would have been enough (and the more parsimonious model) to illustrate the results. While using a recurrent neural network (RNN) shows that the results are, in general, compatible with recurrent dynamics, the specific limitations imposed by RNNs (e.g., dynamical stability, low-dimensional internal dynamics) are not the focus of this study. Indeed, the additional RNN models in the supplementary material show that under more constrained conditions for the RNN (low-dimensional dynamics), using the input control alone runs into difficulties.

      We thank the reviewer for raising this important point. While we agree that recurrent dynamics were not the focus of this study, we would like to point out that 1) dynamics, of some kind, are necessary to simulate the decoder fitting process and 2) recurrent neural networks (RNNs) are valuable for obtaining general insights on how biological constraints shape the reachable manifold:

      (1) To simulate the decoder fitting process, we had to simulate neural activity during the so-called “calibration task”. Some dynamics to these responses are necessary to produce a population response with dimensionality resembling what was found in experiments (10 dimensions). Moreover, dynamics are necessary to create a common direction of high variance across population responses to the calibration task stimuli (see Supplementary Figure 2a and surrounding discussion), which is necessary to reproduce the biases in readouts demonstrated in Figure 4 (as many within-manifold decoder perturbations are aligned with it; Supplementary Figure 2b).

      Because feed-forward networks lack dynamics, reproducing our results with a feed-forward network would require using an input with dynamics. Rather than making an arbitrary choice for these input dynamics, we chose to keep the input static and instead generate the dynamics with a RNN, which is in line with recent models of motor cortex.

      We agree, however, that this is an important point worth clarifying in the manuscript. In our revision we will aim to add a demonstration of how to reproduce a subset of our results with a feed-forward network and a dynamic input.

      (2) While we agree that RNNs impose certain limitations over feed-forward networks, we see these limitations as an advantage because they provide a framework for understanding the structure of the reachable manifold in terms of biological constraints. For example, our simulations in Supplementary Figure 1 show that the dimensionality of the reachable manifold is highly dependent on recurrent connectivity: inhibition-stabilized connectivity makes it higher-dimensional whereas task-specific optimized connectivity makes it lower-dimensional. Such insights are valuable to understand the broader implications and experimental predictions of the re-aiming strategy.

      Because feed-forward networks are untied from the reality of recurrent cortical circuitry, they cannot be characterized in terms of such biological constraints. For instance, as the reviewer points out, dynamical stability is not a well-defined property of feed-forward networks. Such models therefore cannot provide any insight into how the biological constraint of dynamical stability could influence the reachable manifold (which we show it does in Figure 5b). Relatedly, feed-forward networks cannot be optimized to solve complex spatiotemporal tasks like the ballistic reaching task we used for our task-optimized RNN (Supplementary Figure 1, right column), so cannot be used to understand how such behavioral constraints would influence the reachable manifold.

      We agree that these reasons for using RNNs are subtle and left implicit in how they are currently exposed in the text. We will add a discussion point clarifying these in our revision.

      Second, explaining the quantitative differences between the model and data for shifts in tuning curves seems to take the model a bit too literally. The model serves greatly for qualitative observations. I assume, however, that many of the unconstrained aspects of the model would yield quantitatively different results.

      We completely agree: our model is best used to provide a qualitative description of the capabilities of the re-aiming strategy. We will be sure to revise our manuscript to keep such quantitative comparisons at a minimum.

      Reviewer #2 (Public Review):

      The authors mention alternative models (eg, based on synaptic plasticity in the RNN and/or input weights) that can explain the same experimental data that they do, they do not provide any direct comparisons to those models. Thus, the main argument that the authors have in favor of their model is the fact that it is more plausible because it relies on performing the optimization in a low-dimensional space. It would be nice to see more quantitative arguments for why the re-aiming strategy may be more plausible than synaptic plasticity (either by showing that it explains data better, or explaining why it may be more optimal in the context of fast learning).

      We agree this remains a limitation of our study. To contrast our re-aiming model with models of synaptic plasticity (in the input and/or recurrent weights), we have included substantial discussion of these alternative models in two sections of the manuscript:

      • Introduction, where we elaborate on the argument that synaptic plasticity requires solving an exceptionally difficult optimization problem in high dimensions

      • Discussion section “The role of synaptic plasticity in BCI learning”, where we review a number of synaptic plasticity models and experimental results they can account for

      We fully agree that more quantitative comparisons remain an important follow-up to this line of research. However, it is worth noting that there are many such models out there. Moreover, as is the case with many computational models, the results one can achieve with any given model can be highly sensitive to a number of different hyperparameters (e.g. learning rates). We therefore feel that a more rigorous comparison requires deeper study and is out of scope of this manuscript.

      In particular, the authors model the adaptation to outside-manifold perturbations (OMPs) through a "generalized re-aiming strategy". This assumes the existence of additional command variables, which are not used in the original decoding task, but can then be exploited to adapt to these OMPs. While this model is meant to capture the fact that optimization is occurring in a low-dimensional subspace, the fact that animals take longer to adapt to OMPs suggests that WMPs and OMPs may rely on different learning mechanisms, and that synaptic plasticity may actually be a better model of adaptation to OMPs. 

      We thank the reviewer for raising this question. We agree that the fact that animals take longer to adapt to OMPs suggests that the underlying learning strategy is somehow different. But the argument we try to make in this section of the paper is that it in fact does not require an entirely different mechanism. Our simulations show that the same mechanism of re-aiming can suffice to learn OMPs, but it simply requires re-aiming in the larger space of all command variables available to the motor system (rather than just the two command variables evoked by the calibration task). Because this is a much higher-dimensional search space (10-20 vs. 2 dimensions, which is a substantial difference due to the curse of dimensionality), we argue that learning should be slower, even though the mechanism (i.e. re-aiming) is the same.

      This is an important and somewhat surprising takeaway from these simulations, which we will try to bring up more explicitly and clearly in the revision.

      It would be important to discuss how exactly generalized re-aiming would differ from allowing plasticity in the input weights, or in all weights in the network. Do those models make different predictions, and could they be differentiated in future experiments?

      They do in fact make different predictions, and we thank the reviewer for asking and pointing out the lack of discussion of this point. The key difference between these two learning mechanisms is demonstrated in Figure 5b: under generalized re-aiming, there is a fundamental limit to the set of activity patterns one can learn to produce in the brain-computer interface (BCI) learning task. This is quantified in that analysis by the asymptotic participation ratio of the reachable manifold as K increases, which indicates that there is a limited ~12-dimensional subspace that the reachable manifold can occupy. The specific orientation of this subspace is determined by the (recurrent and input) connectivity of the recurrent neural network. With synaptic plasticity in any of the weight matrices (Wrec,Win,U), this subspace could be re-oriented in any arbitrary direction. Our theory of “generalized re-aiming” therefore predicts that the reachable manifold is 1) constrained to a low-d subspace and 2) is not modified when learning BCIs with outside-manifold perturbations.

      Experimentally testing this would require a within-/outside- manifold perturbation BCI learning task akin to that of Sadtler et al, but where the “intrinsic manifold” is measured from population responses evoked by every possible motor command so as to entirely contain the full reachable manifold at max K. This would require measuring motor cortical activity during naturalistic behavior under a wide range of conditions, rather than just in response to the 2D cursor movements on the screen used in the calibration task of the original study. In this case, learning outside-manifold perturbations would require re-orienting the reachable manifold, so a pure generalized re-aiming strategy would fail to learn them. Synaptic plasticity, on the other hand, would not.

      We will be sure to elaborate further on this claim in the revised manuscript.

    1. Author response:

      Reviewer #1 (Public review):

      Weaknesses:

      (1) The manuscript's logical flow is challenging and hard to follow, and key arguments could be more clearly structured, particularly in transitions between mechanistic components.

      We will revise our manuscript so as to make it easy to follow the logical flow in transitions between mechanistic components.

      (2) The causality between stress-induced α2A-AR internalization and the enhanced MAO-A remains unclear. Direct experimental evidence is needed to determine whether α2A-AR internalization itself or Ca<sup>2+</sup> drives MAO-A activation, and how they activate MAO-A should be considered.

      We believe that the causality between stress-induced α2A-AR internalization and the enhancement of MAO-A is clearly demonstrated by our current experiments, while our explanations may be improved by making them easier to understand especially for those who are not expert on electrophysiology.

      Firstly, it is well established that autoinhibition in LC neurons is mediated by α2A-AR coupled-GIRK (Arima et al., 1998, J Physiol; Williams et al., 1985, Neuroscience). We found that spike frequency adaptation in LC neurons was also mediated by α2A-AR coupled GIRK-I (Fig. 1A-I), and that α2A-AR coupled GIRK-I underwent [Ca<sup>2+</sup>]<sub>i</sub>-dependent rundown (Figs. 2, S1, S2), leading to an abolishment of spike-frequency adaptation (Figs. S4). [Ca<sup>2+</sup>]<sub>i</sub>-dependent rundown of α2A-AR coupled GIRK-I was prevented by barbadin (Fig 2G-J), which prevents the internalization of G-protein coupled receptor (GPCR) channels.

      Abolishment of spike frequency adaptation itself, i.e., “increased spike activity” can increase [Ca<sup>2+</sup>]<sub>i</sub> because [Ca<sup>2+</sup>]<sub>i</sub> is entirely dependent on the spike activity as shown by Ca<sup>2+</sup> imaging method in Figure S3.

      Thus, α2A-AR internalization can increase [Ca<sup>2+</sup>]<sub>i</sub> through the abolishment of autoinhibition or spike frequency adaptation, and a [Ca<sup>2+</sup>]<sub>i</sub> increase drives MAO-A activation as reported previously (Cao et al., 2007, BMC Neurosci). The mechanism how Ca<sup>2+</sup> activates MAO-A is beyond the scope of the current study.

      Our study just focused on the mechanism how chronic or sever stress can cause persistent overexcitation and how it results in LC degeneration.

      (3) The connection between α2A-AR internalization and increased cytosolic NA levels lacks direct quantification, which is necessary to validate the proposed mechanism.

      Direct quantification of the relationship between α2A-AR internalization and increased cytosolic NA levels may not be possible, and may not be necessarily needed to be demonstrated as explained below.

      The internalization of α2A-AR can increase [Ca<sup>2+</sup>]<sub>i</sub> through the abolishment of autoinhibition or spike frequency adaptation, and [Ca<sup>2+</sup>]<sub>i</sub> increases can facilitate NA autocrine (Huang et al., 2007), similar to the transmitter release from nerve terminals (Kaeser & Regehr, 2014, Annu Rev Physiol).

      Autocrine released NA must be re-uptaken by NAT (NA transporter), which is firmly established (Torres et al., 2003, Nat Rev Neurosci). Re-uptake of NA by NAT is the only source of intracellular NA, and NA re-uptake by NAT should be increased as the internalization of NA biding site (α2A-AR) progresses in association with [Ca<sup>2+</sup>]<sub>i</sub> increases (see page 11, lines 334-336).

      Thus, the connection between α2A-AR internalization and increased cytosolic NA levels is logically compelling, and the quantification of such connection may not be possible at present (see the response to the comment made by the Reviewer #1 as Recommendations for the authors (2) and beyond the scope of our current study.

      (4) The chronic stress model needs further validation, including measurements of stress-induced physiological changes (e.g., corticosterone levels) to rule out systemic effects that may influence LC activity. Additional behavioral assays for spatial memory impairment should also be included, as a single behavioral test is insufficient to confirm memory dysfunction.

      It is well established that restraint stress (RS) increases corticosterone levels depending on the period of RS (García-Iglesias et al., 2014, Neuropharmacology), although we are not reluctant to measure the corticosterone levels. In addition, there are numerous reports that showed the increased activity of LC neurons in response to various stresses (Valentino et al., 1983; Valentino and Foote, 1988; Valentino et al., 2001; McCall et al., 2015), as described in the text (page 4, lines 96-98). Measurement of cortisol levels may not be able to rule out systemic effects of CRS on the whole brain.

      We had already done another behavioral test using elevated plus maze (EPM) test.

      By combining the two tests, it may be possible to more accurately evaluate the results of Y-maze test by differentiating the memory impairment from anxiety. However, the results obtained by these behavioral tests are just supplementary to our current aim to elucidate the cellular mechanisms for the accumulation of cytosolic free NA. Its subsequent anxiety and memory impairment are just supplementary to our current study. We will soften the implication of anxiety and memory impairment.

      (5) Beyond b-arrestin binding, the role of alternative internalization pathways (e.g., phosphorylation, ubiquitination) in α2A-AR desensitization should be considered, as current evidence is insufficient to establish a purely Ca<sup>2+</sup>-dependent mechanism.

      We can hardly agree with this comment.

      It was clearly demonstrated that repeated application of NA itself did not cause desensitization of α2A-AR (Figure S1A-D), and that the blockade of b-arrestin binding by barbadin completely suppressed the Ca<sup>2+</sup>-dependent downregulation of GIRK (Fig. 2G-K). These observations can clearly rule out the possible involvement of phosphorylation or ubiquitination for the desensitization.

      Not only the barbadin experiment, but also the immunohistochemistry and western blot method clearly demonstrated the decrease of α2A-AR expression on the cell membrane (Fig. 3).

      Ca<sup>2+</sup>-dependent mechanism of the rundown of GIRK was convincingly demonstrated by a set of different protocols of voltage-clamp study, in which Ca<sup>2+</sup> influx was differentially increased. The rundown of GIRK-I was orderly potentiated or accelerated by increasing the number of positive command pulses each of which induces Ca<sup>2+</sup> influx (compare Figure S1E-J, Figure S2A-E and Figure S2F-K along with Fig. 2A-F). The presence or absence of Ca<sup>2+</sup> currents and the amount of Ca<sup>2+</sup> currents determined the trend of the rundown of GIRK-I (Figs. 2, S1 and S2). Because the same voltage protocol hardly caused the rundown when it did not induce Ca<sup>2+</sup> currents in the absence of TEA (Fig. S1F; compare with Fig. 2B), blockade of Ca<sup>2+</sup> currents by nifedipine would not be so beneficial.

      We believe the series of voltage-clamp protocols convincingly demonstrated the orderly involvement of [Ca<sup>2+</sup>]<sub>i</sub> in accelerating the rundown of GIRK-I.

      (6) NA leakage for free NA accumulation is also influenced by NAT or VMAT2. Please discuss the potential role of VMAT2 in NA accumulation within the LC in AD.

      We will discuss the role of VMAT2 in NA accumulation, especially when VMAT2 was impaired. Indeed, it has been demonstrated that reduced VMAT2 levels increased susceptibility to neuronal damage: VMAT2 heterozygote mice displayed increased vulnerability to MPTP as evidenced by reductions in nigral dopamine cell counts (Takahashi et al, 1997, PNAS). Thus, when the activity of VMAT2 in LC neurons were impaired by chronic restraint stress, cytosolic NA levels in LC neurons would increase. We will add such discussion in the revised manuscript.

      (7) Since the LC is a small brain region, proper staining is required to differentiate it from surrounding areas. Please provide a detailed explanation of the methodology used to define LC regions and how LC neurons were selected among different cell types in brain slices for whole-cell recordings.

      LC neurons were identified immunohistochemically and electrophysiologically as we previously reported (see Fig. 2 in Front. Cell. Neurosci. 16:841239. doi: 10.3389/fncel.2022.841239). A delayed spiking pattern in response to depolarizing pulses (Figure S9) applied at a hyperpolarized membrane potential was commonly observed in LC neurons in many studies (Masuko et al., 1986; van den Pol et al., 2002; Wagner-Altendorf et al., 2019).

      Reviewer #2 (Public review):

      Weaknesses:

      (1) The manuscript reports that chronic stress for 5 days increases MAO-A levels in LC neurons, leading to the production of DOPEGAL, activation of AEP, and subsequent tau cleavage into the tau N368 fragment, ultimately contributing to neuronal damage. However, the authors used wild-type C57BL/6 mice, and previous literature has indicated that AEP-mediated tau cleavage in wild-type mice is minimal and generally insufficient to cause significant behavioral alterations. Please clarify and discuss this apparent discrepancy.

      In our study, normalized relative value of AEP-mediated tau cleavage (Tau N368) was much higher in CRS mice than non-stress wild-type mice. It is not possible to compare AEP-mediated tau cleavage between our non-stress wild type mice and those observed in previous study (Zhang et al., 2014, Nat Med), because band intensity is largely dependent on the exposure time and its numerical value is the normalized relative value. In view of such differences, our apparent band expression might have been intensified to detect small changes.

      (2) It is recommended that the authors include additional experiments to examine the effects of different durations and intensities of stress on MAO-A expression and AEP activity. This would strengthen the understanding of stress-induced biochemical changes and their thresholds.

      GIRK rundown was almost saturated after 3-day RS and remained the same in 5-day RS mice (Fig. 4A-G), which is consistent with the downregulation of α2A-AR and GIRK1 expression by 3-day RS (Fig. 3C, F and G; Fig. 4J and K). However, we examine the protein levels of MAO-A, pro/active-AEP and Tau N368 only in 5-day RS mice without examining in 3-day RS mice. This is because we considered the possibility that 3-day RS may be insufficient to induce changes in MAO-A, AEP and Tau N368 and some period of high [Ca<sup>2+</sup>]<sub>i</sub> condition may be necessary to induce such changes. We will discuss this in the revised manuscript.

      (3) Please clarify the rationale for the inconsistent stress durations used across Figures 3, 4, and 5. In some cases, a 3-day stress protocol is used, while in others, a 5-day protocol is applied. This discrepancy should be addressed to ensure clarity and experimental consistency.

      Please see our response to the comment (2).

      (4) The abbreviation "vMAT2" is incorrectly formatted. It should be "VMAT2," and the full name (vesicular monoamine transporter 2) should be provided at first mention.

      Thank you for your suggestion. We will revise accordingly.

    1. Author response:

      We thank the referees for finding our work well written and systematic. We are planning a revision of the manuscript based on the public review and the confidential recommendations of the referees.

      The role of axons:

      Indeed, radial axon projections appear before mature epithelial stripes in the cornea (Iannaccone et al., 2012). Our claim is, however, not that guidance cues are absent, but that global cues are unnecessary. The alignment term in our model, together with evidence that corneal epithelial cells follow contact-mediated substrate cues (Walczysko et al., 2016), show that corneal cells migration is responsive to external forces, and the underlying patterns of axonal projections could be one of those cues.

      Experiments (Collinson et al., 2002) and simulations in this work show that a rapid spiral epithelial flow forms first, with cells migrating radially for ~2 weeks before stripes become visible. Axons seeking the path of least resistance within this moving basal layer would therefore appear radial early on. By contrast, establishing visible stripes requires an entire cohort of epithelial cells to travel from the limbus to the central cornea (Fig. 7). Extensive in-vivo studies (Song et al., 2004; Leiper et al., 2009) find no evidence that axons direct epithelial migration; if anything, epithelial flow dictates axonal trajectories.

      Geometry and boundaries:

      The spiral also forms on a flat disc, but its exact shape changes with curvature and cap angle; this variation is seen across mammals, including humans (Dua et al., 1993) and in diseases such as keratoconus. On a spherical cap the boundary winding number fixes the interior index, so ongoing limbal influx keeps the total index = 1. 

      In the revised version, we will therefore simulate a range of curvatures, cap angles, a prolate ellipsoid, and cases without limbal division, then compare with published data and disease states.

      In-vitro data and parameter fits:

      Although our dataset is limited, the inferred parameters match three independent invitro estimates (Kostanjevec et al. 2020; Saraswathibhatla et al. 2021; Kammeraat et al. in prep.). Spatial correlations exceed those expected from persistence alone, implying some polar alignment - consistent with Saraswathibhatla et al. 2021.  Slide-scanner images that we will include in the revision show cells are neither elongated nor nematically ordered. In the revision we will detail our parameter extraction, highlight evidence for alignment, stress the substrate-based activity mechanism, and draw attention to the supplementary videos.

      Topological clarification:

      Stagnation points can be seen as topological defects because classification depends only on vector directions. Boundary conditions can remove such defects in fluids, yet two sources/sinks still interact via the same logarithmic Green’s function that governs disclinations, despite di^erent physics. The Euler characteristic is a property of the surface; while the boundary winding number fixes the field index, it does not alter the surface’s Euler characteristic. 

      In the revision, we will add a concise primer on the di^erential-geometric concepts to make these points explicit.

    1. Author response:

      We thank the reviewers for their thoughtful and generous assessment of our work. Overall, the reviewers found our work to be novel and relevant. In particular: reviewer #1 found that our manuscript “It is timely and highly valuable for the telomere field” reviewer #2 stated, “Overall, I find this manuscript worthy of publication, as the optimized END-seq methods described here will likely be widely utilized in the telomere field.” Reviewer #3 stated that “The study is original, the experiments were well-controlled and excellently executed.”

      We are extremely grateful for these comments and want to thank all the reviewers and the editors for their time and effort in reviewing our work.

      The reviewers had a number of suggestions to improve our work. We have addressed all the points as highlighted in the point-by-point responses below.

      Reviewer 1:

      One minor question would be whether the authors could expand more on the application of END-Seq to examine the processive steps of the ALT mechanism? Can they speculate if the ssDNA detected in ALT cells might be an intermediate generated during BIR (i.e., is the ssDNA displaced strand during BIR) or a lesion? Furthermore, have the authors assessed whether ssDNA lesions are due to the loss of ATRX or DAXX, either of which can be mutated in the ALT setting?

      We appreciate the reviewer’s insightful questions regarding the application of our assays to investigate the nature of the ssDNA detected in ALT telomeres. Our primary aim in this study was to establish the utility of END-seq and S1-END-seq in telomere biology and to demonstrate their applicability across both ALT-positive and -negative contexts. We agree that exploring the mechanistic origins of ssDNA would be highly informative, and we anticipate that END-seq–based approaches will be well suited for such future studies. However, it remains unclear whether the resolution of S1-END-seq is sufficient to capture transient intermediates such as those generated during BIR. We have now included a brief speculative statement in the revised discussion addressing the potential nature of ssDNA at telomeres in ALT cells.

      Reviewer #2:

      How can we be sure that all telomeres are equally represented? The authors seem to assume that END-seq captures all chromosome ends equally, but can we be certain of this? While I do not see an obvious way to resolve this experimentally, I recommend discussing this potential bias more extensively in the manuscript.

      We thank the reviewer for raising this important point. END-seq and S1-END-seq are unbiased methods designed to capture either double-stranded or single-stranded DNA that can be converted into blunt-ended double-stranded DNA and ligated to a capture oligo. As such, if a subset of telomeres cannot be processed using this approach, it is possible that these telomeres may be underrepresented or lost. However, to our knowledge, there are no proposed telomeric structures that would prevent capture using this method. For example, even if a subset of telomeres possesses a 5′ overhang, it would still be captured by END-seq. Indeed, we observed the consistent presence of the 5′-ATC motif across multiple cell lines and species (human, mouse, and dog). More importantly, we detected predictable and significant changes in sequence composition when telomere ends were experimentally altered, either in vivo (via POT1 depletion) or in vitro (via T7 exonuclease treatment). Together, these findings support the robustness of the method in capturing a representative and dynamic view of telomeres across different systems.

      That said, we have now included a brief statement in the revised discussion acknowledging that we cannot fully exclude the possibility that a subset of telomeres may be missed due to unusual or uncharacterized structures

      I believe Figures 1 and 2 should be merged.

      We appreciate the reviewer’s suggestion to merge Figures 1 and 2. However, we feel that keeping them as separate figures better preserves the logical flow of the manuscript and allows the validation of END-seq and its application to be presented with appropriate clarity and focus. We hope the reviewer agrees that this layout enhances the clarity and interpretability of the data.

      Scale bars should be added to all microscopy figures.

      We thank the reviewer for pointing this out. We have now added scale bars to all the microscopy panels in the figures and included the scale details in the figure legends.

      Reviewer #3:

      Overall, the discussion section is lacking depth and should be expanded and a few additional experiments should be performed to clarify the results.

      We thank the reviewer for the suggestions. Based on this reviewer’s comments and comments for the other reviewers, we incorporated several points into the discussion. As a result, we hope that we provide additional depth to our conclusions.

      (1) The finding that the abundance of variant telomeric repeats (VTRs) within the final 30 nucleotides of the telomeric 5' ends is similar in both telomerase-expressing and ALT cells is intriguing, but the authors do not address this result. Could the authors provide more insight into this observation and suggest potential explanations? As the frequency of VTRs does not seem to be upregulated in POT1-depleted cells, what then drives the appearance of VTRs on the C-strand at the very end of telomeres? Is CST-Pola complex responsible?

      The reviewer raises a very interesting and relevant point. We are hesitant at this point to speculate on why we do not see a difference in variant repeats in ALT versus non-ALT cells, since additional data would be needed. One possibility is that variant repeats in ALT cells accumulate stochastically within telomeres but are selected against when they are present at the terminal portion of chromosome ends. However, to prove this hypothesis, we would need error-free long-read technology combined with END-seq. We feel that developing this approach would be beyond the scope of this manuscript.

      (2) The authors also note that, in ALT cells, the frequency of VTRs in the first 30 nucleotides of the S1-END-SEQ reads is higher compared to END-SEQ, but this finding is not discussed either. Do the authors think that the presence of ssDNA regions is associated with the VTRs? Along this line, what is the frequency of VTRs in the END-SEQ analysis of TRF1-FokI-expressing ALT cells? Is it also increased? Has TRF1-FokI been applied to telomerase-expressing cells to compare VTR frequencies at internal sites between ALT and telomerase-expressing cells?

      Similarly to what is discussed above, short reads have the advantage of being very accurate but do not provide sufficient length to establish the relative frequency of VTRs across the whole telomere sequence. The TRF1-FokI experiment is a good suggestion, but it would still be biased toward non-variant repeats due to the TRF1-binding properties. We plan to address these questions in a future study involving long-read sequencing and END-seq capture of telomeres.

      Finally, in these experiments (S1-END-SEQ or END-SEQ in TRF1-Fok1), is the frequency of VTRs the same on both the C- and the G-rich strands? It is possible that the sequences are not fully complementary in regions where G4 structures form.

      We thank the reviewer for this observation. While we do observe a higher frequency of variant telomeric repeats (VTRs) in the first 30 nucleotides of S1-END-seq reads compared to END-seq in ALT cells, we are currently unable to determine whether this difference is significant, as an appropriate control or matched normalization strategy for this comparison is lacking. Therefore, we refrain from overinterpreting the biological relevance of this observation.

      The reviewer is absolutely correct. Our calculation did not exclude the possibility of extrachromosomal DNA as a source of telomeric ssDNA. We have now addressed this point in our discussion.

      The reviewer is correct in pointing out that we still do not know what causes ssDNA at telomeres in ALT cells. Replication stress seems the most logical explanation based on the work of many labs in the field. However, our data did not reveal any significant difference in the levels of ssDNA at telomeres in non-ALT cells based on telomere length. We used the HeLa1.2.11 cell line (now clarified in the Materials section), which is the parental line of HeLa1.3 and has similarly long telomeres (~20 kb vs. ~23 kb). Despite their long telomeres and potential for replication-associated challenges such as G-quadruplex formation, HeLa1.2.11 cells did not exhibit the elevated levels of telomeric ssDNA that we observed in ALT cells (Figure 4B). Additional experiments are needed to map the occurrence of ssDNA at telomeres in relation to progression toward ALT.

      (3) Based on the ratio of C-rich to G-rich reads in the S1-END-SEQ experiment, the authors estimate that ALT cells contain at least 3-5 ssDNA regions per chromosome end. While the calculation is understandable, this number could be discussed further to consider the possibility that the observed ratios (of roughly 0.5) might result from the presence of extrachromosomal DNA species, such as C-circles. The observed increase in the ratio of C-rich to G-rich reads in BLM-depleted cells supports this hypothesis, as BLM depletion suppresses C-circle formation in U2OS cells. To test this, the authors should examine the impact of POLD3 depletion on the C-rich/G-rich read ratio. Alternatively, they could separate high-molecular-weight (HMW) DNA from low-molecular-weight DNA in ALT cells and repeat the S1-END-SEQ in the HMW fraction.

      The reviewer is absolutely correct. Our calculation did not exclude the possibility of extrachromosomal DNA as a source of telomeric ssDNA. We have now addressed this point in our discussion.

      (4) What is the authors' perspective on the presence of ssDNA at ALT telomeres? Do they attribute this to replication stress? It would be helpful for the authors to repeat the S1-END-SEQ in telomerase-expressing cells with very long telomeres, such as HeLa1.3 cells, to determine if ssDNA is a specific feature of ALT cells or a result of replication stress. The increased abundance of G4 structures at telomeres in HeLa1.3 cells (as shown in J. Wong's lab) may indicate that replication stress is a factor. Similar to Wong's work, it would be valuable to compare the C-rich/G-rich read ratios in HeLa1.3 cells to those in ALT cells with similar telomeric DNA content.

      The reviewer is correct in pointing out that we still do not know what causes ssDNA at telomeres in ALT cells. Replication stress seems the most logical explanation based on the work of many labs in the field. However, our data did not reveal any significant difference in the levels of ssDNA at telomeres in non-ALT cells based on telomere length. We used the HeLa1.2.11 cell line (now clarified in the Materials section), which is the parental line of HeLa1.3 and has similarly long telomeres (~20 kb vs. ~23 kb). Despite their long telomeres and potential for replication-associated challenges such as G-quadruplex formation, HeLa1.2.11 cells did not exhibit the elevated levels of telomeric ssDNA that we observed in ALT cells (Figure 4B). Additional experiments are needed to map the occurrence of ssDNA at telomeres in relation to progression toward ALT.

      Finally, Reviewer #3 raises a list of minor points:

      (1) The Y-axes of Figure 4 have been relabeled to account for the G-strand reads.

      (2) Statistical analyses have been added to the figures where applicable.

      (3) The manuscript has been carefully proofread to improve clarity and consistency throughout the text and figure legends.

      (4) We have revised the text to address issues related to the lack of cross-referencing between the supplementary figures and their corresponding legends.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, authors have tried to repurpose cipargamin (CIP), a known drug against Plasmodium and Toxoplasma against Babesia. They proved the efficacy of CIP on Babesia in nanomolar range. In silico analyses revealed the drug resistance mechanism through a single amino acid mutation at amino acid position 921 on the ATP4 gene of Babesia. Overall, the conclusions drawn by the authors are well justified by their data. I believe this study opens up a novel therapeutic strategy against babesiosis.

      Strengths:

      Authors have carried out a comprehensive study. All the experiments performed were carried out methodically and logically.

      We appreciate your positive feedback. Your acknowledgment reinforces our commitment to rigor and thoroughness in our research.

      Reviewer #3 (Public review):

      Summary:

      The authors aim to establish that cipargamin can be used for the treatment of infection caused by Babesia organisms.

      Strengths:

      The study provides strong evidence that cipargamin is effective against various Babesia species. In vitro growth assays were used to establish that cipargamin is effective against Babesia bovis and Babesia gibsoni. Infection of mice with Babesia microti demonstrated that cipargamin is as effective as the combination of atovaquone plus azithromycin. Cipargamin protected mice from lethal infection with Babesia rodhaini. Mutations that confer resistance to cipargamin were identified in the gene encoding ATP4, a P-type Na+ ATPase that is found in other apicomplexan parasites, thereby validating ATP4 as the target of cipargamin. A 7-day treatment of cipagarmin, when combined with a single dose of tafenoquine, was sufficient to eradicate Babesia microti in a mouse model of severe babesiosis caused by lack of adaptive immunity.

      Thank you for the comments and for your time to review our manuscript.

      Weaknesses:

      Cipargamin was tested in vivo at a single dose administered daily for 7 days. Despite the prospect of using cipargamin for the treatment of human babesiosis, there was no attempt to identify the lowest dose of cipagarmin that protects mice from Babesia microti infection. In the SCID mouse model, cipargamin was tested in combination with tafenoquine but not with atovaquone and/or azithromycin, although the latter combination is often used as first-line therapy for human babesiosis caused by Babesia microti.

      Thank you for your insightful comments. We agree that using a single daily dose over 7 days is one of the limitations in the in vivo trial. Our main goals were to demonstrate cipargamin's efficacy and understand its antibabesial agent mechanism. For future work, we plan to conduct dose‐optimization studies to determine the lowest effective dose in vivo. Regarding the drug combination in the SCID mouse model, although atovaquone and/or azithromycin are frequently used as first-line therapies for human babesiosis, resistance to these traditional drugs is emerging. Based on this challenge, we opted to evaluate a combination with tafenoquine as a novel partner, aiming to overcome resistance issues and improve therapeutic outcomes.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      None other than some minor grammatical mistakes.

      We have corrected the grammatical mistakes.

      Reviewer #3 (Recommendations for the authors):

      The revised manuscript is much improved. I have the following comments.

      Comment 1: Atovaquone plus azithromycin is effective against Babesia microti (Figure 1C) but not against Babesia rodhaini (Figure 1E). It would be valuable to provide a possible explanation.

      Thank you for highlighting this issue. One potential explanation is that B. microti and B. rodhaini might have intrinsic differences in drug sensitivity and susceptibility. A previous study reported that both species possess a unique linear monomeric mitochondrial genome with a dual flip-flop inversion system, which generates four distinct genome structures (Hikosaka et al., 2012). In addition, previous studies have shown that mitochondria-associated energy production is greater in B. microti than in B. rodhaini (Shikano et al., 1998). This suggests that B. microti, whose metabolism is largely driven by mitochondrial function, may be more susceptible to drugs (like atovaquone) that induce parasite death by disrupting mitochondrial targets such as cytochrome b (Wormser et al., 2010). Moreover, B. rodhaini tends to proliferate more rapidly and causes acute infections, which may outpace any drug effects. Further, the rapid proliferation of apicomplexan parasites, as is the case in Plasmodium (Salcedo-Sora et al., 2014), Theileria (Metheni et al., 2015), and B. rodhaini (Rickard, 1970; Shikano et al., 1995), has been ascribed to glycolysis as the primary energy source. This may have contributed to the reduced efficacy of atovaquone and azithromycin in B. rodhaini-infected mice in the current study. Nonetheless, we plan to explore these interspecies differences in our future work.

      Comment 2: The relapse that follows a 7-day treatment with cipargamin is transient in BALB/ mice infected with Babesia rodhaini (Figure 1E) but persistent in SCID mice infected with Babesia microti (Figure 5C). It would be valuable to provide a possible explanation.

      Thank you for your insightful comment. One possible explanation is the difference in immune status between the two mouse models. BALB/c mice have a fully functional immune system that can likely clear residual parasites following a transient relapse after cipargamin treatment. In contrast, SCID mice lack an adaptive immune response, which might allow residual B. microti parasites to persist and cause a sustained relapse. Additionally, intrinsic differences between B. rodhaini and B. microti, such as growth rate or drug susceptibility, could also play a role. We plan to explore these factors in future studies.

      Comment 3: The effect of cipargamin on parasite pH is the greatest when assessed 4 to 8 min after exposure is initiated (Figure 3E). Yet, resistance of parasites that carry a mutation in ATP4, the target of cipargamin, was assessed 20 min after cipargamin addition. At this time point, cipargamin has very little effect (Figure 3E). Accordingly, data reported in Figure 3G are of limited value.

      Thank you for your comment. The initial pH increase we see around 4 to 8 minutes likely reflects the rapid inhibition of ATP4-mediated Na⁺/H⁺ exchange by cipargamin, which quickly alkalinizes the cell. However, after the initial increase, compensatory processes, such as proton influx or metabolic acid production, gradually restored the pH, resulting in a later decline. Although assessing the pH level at 20 minutes may have recorded less dramatic changes, it still allowed us to compare the sustained differences between wild-type and mutant strains. We agree that including earlier time points for the mutants might provide further insight and we will consider this in our future work.

      Comment 4: In Figure 3H, please report the lack of statistical significance between wild-type parasites and parasites that carry the mutation L921V.

      In Figure 3H, the ATPase activity in erythrocytes infected with wild-type parasites (6.31 ± 1.20 nmol Pi/mg protein/min) is higher than that of the L921V mutation (5.11 ± 0.50 nmol Pi/mg protein/min), but the difference is not statistically significant (P = 0.095), so no asterisk was added.

      Comment 5: Tafenoquine was administered as a single 20 mg/kg dose. Please specify whether this dose is for tafenoquine succinate or tafenoquine base.

      Thank you for raising this point. In our study, the single 20 mg/kg dose refers to tafenoquine succinate. We have clarified this detail in the revised manuscript (Line 40).

      Comment 6: A single dose of 20 mg/kg tafenoquine succinate was first tested in the SCID mouse model of severe babesiosis by Mordue et al (JID 2019), not by Liu et al. (JID 2024). Please amend discussion accordingly (line 311). As correctly stated in the discussion, the single 20 mg/kg dose was not sufficient to prevent relapse of Babesia microti in the study by Mordue et al. Please provide a possible explanation for why no parasitemia was detected for 90 days in your SCID model (Figure 5C).

      Thank you for your comment. We have modified the suggested citation (Line 309). As noted by Mordue et al. (JID 2019), a single 20 mg/kg dose of tafenoquine succinate was insufficient to prevent relapse in their SCID mouse model using B. microti (ATCC 30221 Gray strain). In our study, however, no parasitemia was detected for 90 days (Figure 5C) using the B. microti Peabody mjr strain (ATCC PRA-99). Differences in the parasite strain and the timing of treatment relative to infection may have contributed to the extended suppression of parasitemia observed in our study. We plan to explore these aspects in future work.

      Comment 7: Real-time PCR was used to confirm eradication of Babesia microti infection (Figure 5D). Please specify the blood volume from which genomic DNA was extracted for each mouse. Please specify the amount of genomic DNA (i.e., not the volume) used in each reaction. Please explain how/why the cut-off was set at 35 cycles. What were the Ct values when blood was obtained from uninfected mice? For infected mice treated with cipargamin plus tafenoquine, there was no amplification. Was each reaction subjected to a maximum of 40 cycles (as suggested by Figure 5D)?

      In our qPCR assay, genomic DNA was extracted from 200 µL of blood per mouse (Line 458). In each reaction, we used 100 ng of genomic DNA (Line 464), and the thermocycling conditions were set at 40 cycles. We set the cut-off at 35 cycles based on our optimization experiments: samples with Ct values ≤ 35 consistently indicated the presence of parasite DNA, while samples without parasite DNA (distilled water and DNA from uninfected mice) had CT values > 35 cycles or undetectable. Although each reaction was run for 40 cycles, for our analysis, we defined samples as negative if no signal was observed beyond cycle 35. In mice treated with cipargamin plus tafenoquine, no signal was detected until 40 cycles, indicating the absence of parasite DNA in the samples.

      Comment 8:  Persistence of parasite DNA in blood of tafenoquine treated mice highlights the limitation of PCR to assess persistence of infection. That is, PCR cannot distinguish between viable parasites and non-viable (or dead) parasites. An adoptive transfer of blood to immunocompromised mice can help determine whether persistence of DNA is due to persistence of viable parasites. Because the experiment was carried out in SCID mice, no adoptive transfer is needed. Few parasites are required for a successful infection of immunocompromised mice (SCID mice included). Given that parasitemia never rose following treatment of SCID mice with a single dose of tafenoquine, it is highly likely that parasite DNA detected on day 90 post-infection in these tafenoquine treated mice came from persistent non-viable/dead parasites.

      We appreciate your comment and acknowledge that the use of PCR has limitations in differentiating between live and dead parasites. It is possible that the residual DNA may represent a small population of dormant parasites that are not actively replicating and thus remain below the detection threshold of parasitemia. Even in highly immunocompromised SCID mice, such dormant parasites might persist without causing overt infection under our experimental conditions. An adoptive transfer experiment in SCID mice, although not strictly necessary, could validate whether the detection of low levels of DNA comes from viable parasites capable of reactivating under different circumstances. Future studies using more sensitive viability assays or adoptive transfer approaches could provide further insights into this possibility.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This study examined the interaction between two key cortical regions in the mouse brain involved in goal-directed movements, the rostral forelimb area (RFA) - considered a premotor region involved in movement planning, and the caudal forelimb area (CFA) - considered a primary motor region that more directly influences movement execution. The authors ask whether there exists a hierarchical interaction between these regions, as previously hypothesized, and focus on a specific definition of hierarchy - examining whether the neural activity in the premotor region exerts a larger functional influence on the activity in the primary motor area than vice versa. They examine this question using advanced experimental and analytical methods, including localized optogenetic manipulation of neural activity in either region while measuring both the neural activity in the other region and EMG signals from several muscles involved in the reaching movement, as well as simultaneous electrophysiology recordings from both regions in a separate cohort of animals.

      The findings presented show that localized optogenetic manipulation of neural activity in either RFA or CFA resulted in similarly short-latency changes in the muscle output and in firing rate changes in the other region. However, perturbation of RFA led to a larger absolute change in the neural activity of CFA neurons. The authors interpret these findings as evidence for reciprocal, but asymmetrical, influence between the regions, suggesting some degree of hierarchy in which RFA has a greater effect on the neural activity in CFA. They go on to examine whether this asymmetry can also be observed in simultaneously recorded neural activity patterns from both regions. They use multiple advanced analysis methods that either identify latent components at the population level or measure the predictability of firing rates of single neurons in one region using firing rates of single neurons in the other region. Interestingly, the main finding across these analyses seems to be that both regions share highly similar components that capture a high degree of variability of the neural activity patterns in each region. Single units' activity from either region could be predicted to a similar degree from the activity of single units in the other region, without a clear division into a leading area and a lagging area, as one might expect to find in a simple hierarchical interaction. However, the authors find some evidence showing a slight bias towards leading activity in RFA. Using a two-region neural network model that is fit to the summed neural activity recorded in the different experiments and to the summed muscle output, the authors show that a network with constrained (balanced) weights between the regions can still output the observed measured activities and the observed asymmetrical effects of the optogenetic manipulations, by having different within-region local weights. These results put into question whether previous and current findings that demonstrate asymmetry in the output of regions can be interpreted as evidence for asymmetrical (and thus hierarchical) inputs between regions, emphasizing the challenges in studying interactions between any brain regions.

      Strengths:

      The experiments and analyses performed in this study are comprehensive and provide a detailed examination and comparison of neural activity recorded simultaneously using dense electrophysiology probes from two main motor regions that have been the focus of studies examining goal-directed movements. The findings showing reciprocal effects from each region to the other, similar short-latency modulation of muscle output by both regions, and similarity of neural activity patterns without a clear lead/lag interaction, are convincing and add to the growing body of evidence that highlight the complexity of the interactions between multiple regions in the motor system and go against a simple feedforward-like network and dynamics. The neural network model complements these findings and adds an important demonstration that the observed asymmetry can, in theory, also arise from differences in local recurrent connections and not necessarily from different input projections from one region to the other. This sheds an important light on the multiple factors that should be considered when studying the interaction between any two brain regions, with a specific emphasis on the role of local recurrent connections, that should be of interest to the general neuroscience community.

      Weaknesses:

      While the similarity of the activity patterns across regions and lack of a clear leading/lagging interaction are interesting observations that are mostly supported by the findings presented (however, see comment below for lack of clarity in CCA/PLS analyses), the main question posed by the authors - whether there exists an endogenous hierarchical interaction between RFA and CFA - seems to be left largely open. 

      The authors note that there is currently no clear evidence of asymmetrical reciprocal influence between naturally occurring neural activity patterns of the two regions, as previous attempts have used non-natural electrical stimulation, lesions, or pharmacological inactivation. The use of acute optogenetic perturbations does not seem to be vastly different in that aspect, as it is a non-natural stimulation of inhibitory interneurons that abruptly perturbs the ongoing dynamics.

      We do believe that our optogenetic inactivation identifies a causal interaction between the endogenous activity patterns in the excitatory projection neurons, which we have largely silenced, and the downstream endogenous activity that is perturbed. The effect in the downstream region results directly from the silencing of activity in the excitatory projection neurons that mediate each region’s interaction with other regions. Here we have performed a causal intervention common in biology: a loss-of-function experiment. Such experiments generally reveal that a causal interaction of some sort is present, but often do not clarify much about the nature of the interaction, as is true in our case. By showing that a silencing of endogenous activity in one motor cortical region causes a significant change to the endogenous activity in another, we establish a causal relationship between these activity patterns. This is analogous to knocking out the gene for a transcription factor and observing causal effects on the expression of other genes that depend on it. 

      Moreover, our experiments are, to our knowledge, the first that localize a causal relationship to endogenous activity in motor cortex at a particular point during a motor behavior. Lesion and pharmacological or chemogenetic inactivation have long-lasting effects, and so their consequences on firing in other regions cannot be attributed to a short-latency influence of activity at a particular point during movement. Moreover, the involvement of motor cortex in motor learning and movement preparation/initiation complicates the interpretation of these consequences in relation to movement execution, as disturbance to processes on which execution depends can impede execution itself. Stimulation experiments generate spiking in excitatory projection neurons that is not endogenous.

      That said, we would agree that the form of the causal interaction between RFA and CFA remains unaddressed by our results. These results do not expose how the silenced activity patterns affect activity in the downstream region, just as knocking out a transcription factor gene does not expose how the transcription factor influences the expression of other genes. To show evidence for a specific type of interaction dynamics between RFA and CFA, a different sort of experiment would be necessary. See Jazayeri and Afraz, Neuron, 2017 for more on this issue.

      Furthermore, the main finding that supports a hierarchical interaction is a difference in the absolute change of firing rates as a result of the optogenetic perturbation, a finding that is based on a small number of animals (N = 3 in each experimental group), and one which may be difficult to interpret. 

      Though N = 3, we do show statistical significance. Moreover, using three replicates is not uncommon in biological experiments that require a large technical investment.

      As the authors nicely demonstrate in their neural network model, the two regions may differ in the strength of local within-region inhibitory connections. Could this theoretically also lead to a difference in the effect of the artificial light stimulation of the inhibitory interneurons on the local population of excitatory projection neurons, driving an asymmetrical effect on the downstream region? 

      We (Miri et al., Neuron, 2017) and others (Guo et al., Neuron, 2014) have shown that the effect of this inactivation on excitatory neurons in CFA is a near-complete silencing (90-95% within 20 ms). There thus is not much room for the effects on projection neurons in RFA to be much larger. We have measured these local effects in RFA as part of other work (Kristl et al., biorxiv, 2025), verifying that the effects on RFA projection neuron firing are not larger.

      Moreover, the manipulation was performed upon the beginning of the reaching movement, while the premotor region is often hypothesized to exert its main control during movement preparation, and thus possibly show greater modulation during that movement epoch. It is not clear if the observed difference in absolute change is dependent on the chosen time of optogenetic stimulation and if this effect is a general effect that will hold if the stimulation is delivered during different movement epochs, such as during movement preparation.

      We agree that the dependence of RFA-CFA interactions on movement phase would be interesting to address in subsequent experiments. While a strong interpretation of lesion results might lead to a hypothesis that premotor influence on primary motor cortex is local to, or stronger during, movement preparation as opposed to execution, at present there is to our knowledge no empirical support from interventional experiments for this hypothesis. Moreover, existing results from analysis of activity in these two regions have produced conflicting results on the strength of interaction between these regions during preparation. Compare for example BachschmidRomano et al., eLife, 2023 to Kaufman et al., Nature Neuroscience, 2014.

      That said, this lesion interpretation would predict the same asymmetry we have observed from perturbations at the beginning of a reach - a larger effect of RFA on CFA than vice versa.

      Another finding that is not clearly interpretable is in the analysis of the population activity using CCA and PLS. The authors show that shifting the activity of one region compared to the other, in an attempt to find the optimal leading/lagging interaction, does not affect the results of these analyses. Assuming the activities of both regions are better aligned at some unknown groundtruth lead/lag time, I would expect to see a peak somewhere in the range examined, as is nicely shown when running the same analyses on a single region's activity. If the activities are indeed aligned at zero, without a clear leading/lagging interaction, but the results remain similar when shifting the activities of one region compared to the other, the interpretation of these analyses is not clear.

      Our results in this case were definitely surprising. Many share the intuition that there should be a lag at which the correlations in activity between regions may be strongest. The similarity in alignment across lags we observed might be expected if communication between regions occurs over a range of latencies as a result of dependence on a broad diversity of synaptic paths that connect neurons. In the Discussion, we offer an explanation of how to reconcile these findings with the seemingly different picture presented by DLAG.

      Reviewer #2 (Public review):

      Summary:

      While technical advances have enabled large-scale, multi-site neural recordings, characterizing inter-regional communication and its behavioral relevance remains challenging due to intrinsic properties of the brain such as shared inputs, network complexity, and external noise. This work by Saiki-Ishkawa et al. examines the functional hierarchy between premotor (PM) and primary motor (M1) cortices in mice during a directional reaching task. The authors find some evidence consistent with an asymmetric reciprocal influence between the regions, but overall, activity patterns were highly similar and equally predictive of one another. These results suggest that motor cortical hierarchy, though present, is not fully reflected in firing patterns alone.

      Strengths:

      Inferring functional hierarchies between brain regions, given the complexity of reciprocal and local connectivity, dynamic interactions, and the influence of both shared and independent external inputs, is a challenging task. It requires careful analysis of simultaneous recording data, combined with cross-validation across multiple metrics, to accurately assess the functional relationships between regions. The authors have generated a valuable dataset simultaneously recording from both regions at scale from mice performing a cortex-dependent directional reaching task.

      Using electrophysiological and silencing data, the authors found evidence supporting the traditionally assumed asymmetric influence from PM to M1. While earlier studies inferred a functional hierarchy based on partial temporal relationships in firing patterns, the authors applied a series of complementary analyses to rigorously test this hierarchy at both individual neuron and population levels, with robust statistical validation of significance.

      In addition, recording combined with brief optogenetic silencing of the other region allowed authors to infer the asymmetric functional influence in a more causal manner. This experiment is well designed to focus on the effect of inactivation manifesting through oligosynaptic connections to support the existence of a premotor to primary motor functional hierarchy.

      Subsequent analyses revealed a more complex picture. CCA, PLS, and three measures of predictivity (Granger causality, transfer entropy, and convergent cross-mapping) emphasized similarities in firing patterns and cross-region predictability. However, DLAG suggested an imbalance, with RFA capturing CFA variance at a negative time lag, indicating that RFA 'leads' CFA. Taken together these results provide useful insights for current studies of functional hierarchy about potential limitations in inferring hierarchy solely based on firing rates.

      While I would detail some questions and issues on specifics of data analyses and modeling below, I appreciate the authors' effort in training RNNs that match some behavioral and recorded neural activity patterns including the inactivation result. The authors point out two components that can determine the across-region influence - 1) the amount of inputs received and 2) the dependence on across-region input, i.e., the relative importance of local dynamics, providing useful insights in inferring functional relationships across regions.

      Weaknesses:

      (1) Trial-averaging was applied in CCA and PLS analyses. While trial-averaging can be appropriate in certain cases, it leads to the loss of trial-to-trial variance, potentially inflating the perceived similarities between the activity in the two regions (Figure 4). Do authors observe comparable degrees of similarity, e.g., variance explained by canonical variables? Also, the authors report conflicting findings regarding the temporal relationship between RFA and CFA when using CCA/PLS versus DLAG. Could this discrepancy be due to the use of trial-averaging in former analyses but not in the latter?

      We certainly agree that the similarity in firing patterns is higher in trial averages than on single trials, given the variation in single-neuron firing patterns across trials. Here, we were trying to examine the similarity of activity variance that is clearly movement dependent, as trial averages are, and to use an approach aligned with those applied in the existing literature. We would also agree that there is more that can be learned about interactions from trial-by-trial analysis. It is possible that the activity components identified by DLAG as being asymmetric somehow are not reflected strongly in trial averages. In our Discussion we offer another potential explanation that is based on other differences in what is calculated by DLAG and CCA/PLS.

      We also note here that all of the firing pattern predictivity analysis we report (Figure 6) was done on single trial data, and in all cases the predictivity was symmetric. Thus, our results in aggregate are not consistent with symmetry purely being an artifact of trial averaging.

      (2) A key strength of the current study is the precise tracking of forelimb muscle activity during a complex motor task involving reaching for four different targets. This rich behavioral data is rarely collected in mice and offers a valuable opportunity to investigate the behavioral relevance of the PM-M1 functional interaction, yet little has been done to explore this aspect in depth. For example, single-trial time courses of inter-regional latent variables acquired from DLAG analysis can be correlated with single-trial muscle activity and/or reach trajectories to examine the behavioral relevance of inter-regional dynamics. Namely, can trial-by-trial change in inter-regional dynamics explain behavioral variability across trials and/or targets? Does the inter-areal interaction change in error trials? Furthermore, the authors could quantify the relative contribution of across-area versus within-area dynamics to behavioral variability. It would also be interesting to assess the degree to which across-area and within-area dynamics are correlated. Specifically, can acrossarea dynamics vary independently from within-area dynamics across trials, potentially operating through a distinct communication subspace?

      These are all very interesting questions. Our study does not attempt to parse activity into components predictive of muscle activity and others that may reflect other functions. Distinct components of RFA and CFA activity may very well rely on distinct interactions between them.

      (3) While network modeling of RFA and CFA activity captured some aspects of behavioral and neural data, I wonder if certain findings such as the connection weight distribution (Figure 7C), across-region input (Figure 7F), and the within-region weights (Figure 7G), primarily resulted from fitting the different overall firing rates between the two regions with CFA exhibiting higher average firing rates. Did the authors account for this firing rate disparity when training the RNNs?

      The key comparison in Figure 7 is shown in 7F, where the firing rates are accounted for in calculating the across-region input strength. Equalizing the firing rates in RFA and CFA would effectively increase RFA rates. If the mean firing rates in each region were appreciably dependent on across-region inputs, we would then expect an off-setting change in the RFA→CFA weights, such that the RFA→CFA distributions in 7F would stay the same. We would also expect the CFA→RFA weights would increase, since RFA neurons would need more input. This would shift the CFA→RFA (blue) distributions up. Thus, if anything, the key difference in this panel would only get larger. 

      We also generally feel that it is a better approach to fit the actual firing rates, rather than normalizing, since normalizing the firing rates would take us further from the actual biology, not closer.

      (4) Another way to assess the functional hierarchy is by comparing the time courses of movement representation between the two regions. For example, a linear decoder could be used to compare the amount of information about muscle activity and/or target location as well as time courses thereof between the two regions. This approach is advantageous because it incorporates behavior rather than focusing solely on neural activity. Since one of the main claims of this study is the limitation of inferring functional hierarchy from firing rate data alone, the authors should use the behavior as a lens for examining inter-areal interactions.

      As we state above, we agree that examining interactions specific to movement-related activity components could reveal interesting structure in interregional interactions. Since it remains a challenge to rigorously identify a subset of neural activity patterns specifically related to driving muscle activity, any such analysis would involve an additional assumption. It remains unclear how well the activity that decoders use for predicting muscle activity matches the activity that actually drives muscle activity in situ.

      To address this issue, which related to one raised by Reviewer #3 below, we have added an additional paragraph to the Discussion (see “Manifestations of hierarchy in firing patterns”).

      Reviewer #3 (Public review):

      This study investigates how two cortical regions that are central to the study of rodent motor control (rostral forelimb area, RFA, and caudal forelimb area, CFA) interact during directional forelimb reaching in mice. The authors investigate this interaction using

      (1) optogenetic manipulations in one area while recording extracellularly from the other, (2) statistical analyses of simultaneous CFA/RFA extracellular recordings, and (3) network modeling.

      The authors provide solid evidence that asymmetry between RFA and CFA can be observed, although such asymmetry is only observed in certain experimental and analytical contexts.

      The authors find asymmetry when applying optogenetic perturbations, reporting a greater impact of RFA inactivation on CFA activity than vice-versa. The authors then investigate asymmetry in endogenous activity during forelimb movements and find asymmetry with some analytical methods but not others. Asymmetry was observed in the onset timing of movement-related deviations of local latent components with RFA leading CFA (computed with PCA) and in a relatively higher proportion and importance of cross-area latent components with RFA leading than CFA leading (computed with DLAG). However, no asymmetry was observed using several other methods that compute cross-area latent dynamics, nor with methods computed on individual neuron pairs across regions. The authors follow up this experimental work by developing a twoarea model with asymmetric dependence on cross-area input. This model is used to show that differences in local connectivity can drive asymmetry between two areas with equal amounts of across-region input.

      Overall, this work provides a useful demonstration that different cross-area analysis methods result in different conclusions regarding asymmetric interactions between brain areas and suggests careful consideration of methods when analyzing such networks is critical. A deeper examination of why different analytical methods result in observed asymmetry or no asymmetry, analyses that specifically examine neural dynamics informative about details of the movement, or a biological investigation of the hypothesis provided by the model would provide greater clarity regarding the interaction between RFA and CFA.

      Strengths:

      The authors are rigorous in their experimental and analytical methods, carefully monitoring the impact of their perturbations with simultaneous recordings, and providing valid controls for their analytical methods. They cite relevant previous literature that largely agrees with the current work, highlighting the continued ambiguity regarding the extent to which there exists an asymmetry in endogenous activity between RFA and CFA.

      A strength of the paper is the evidence for asymmetry provided by optogenetic manipulation. They show that RFA inactivation causes a greater absolute difference in muscle activity than CFA interaction (deviations begin 25-50 ms after laser onset, Figure 1) and that RFA inactivation causes a relatively larger decrease in CFA firing rate than CFA inactivation causes in RFA (deviations begin <25ms after laser onset, Figure 3). The timescales of these changes provide solid evidence for an asymmetry in the impact of inactivating RFA/CFA on the other region that could not be driven by differences in feedback from disrupted movement (which would appear with a ~50ms delay).

      The authors also utilize a range of different analytical methods, showing an interesting difference between some population-based methods (PCA, DLAG) that observe asymmetry, and single neuron pair methods (granger causality, transfer entropy, and convergent cross mapping) that do not. Moreover, the modeling work presents an interesting potential cause of "hierarchy" or "asymmetry" between brain areas: local connectivity that impacts dependence on across-region input, rather than the amount of across-region input actually present.

      Weaknesses:

      There is no attempt to examine neural dynamics that are specifically relevant/informative about the details of the ongoing forelimb movement (e.g., kinematics, reach direction). Thus, it may be preemptive to claim that firing patterns alone do not reflect functional influence between RFA/CFA. For example, given evidence that the largest component of motor cortical activity doesn't reflect details of ongoing movement (reach direction or path; Kaufman, et al. PMID: 27761519) and that the analytical tools the authors use likely isolate this component (PCA, CCA), it may not be surprising that CFA and RFA do not show asymmetry if such asymmetry is related to the control of movement details. 

      An asymmetry may still exist in the components of neural activity that encode information about movement details, and thus it may be necessary to isolate and examine the interaction of behaviorally-relevant dynamics (e.g., Sani, et al. PMID: 33169030).

      To clarify, we are not claiming that firing patterns in no way reflect the asymmetric functional influence that we demonstrate with optogenetic inactivation. Instead, we show that certain types of analysis that we might expect to reflect such influence, in fact, do not. Indeed, DLAG did exhibit asymmetries that matched those seen in functional influence (at least qualitatively), though other methods we applied did not.

      As we state above, we do think that there is more that can be gleaned by looking at influence specifically in terms of activity related to movement. However, if we did find that movement-related activity exhibited an asymmetry following functional influence, our results imply that the remaining activity components would exhibit an opposite asymmetry, such that the overall balance is symmetric. This would itself be surprising. We also note that the components identified by CCA and PLS do show substantial variation across reach targets, indicating that they are not only reflecting condition-invariant components. These analyses were performed on components accounting for well over 90% of the total activity variance, suggesting that both conditiondependent and condition-invariant components should be included.

      To address the concern about condition-dependent and condition-invariant components, we have added a sentence to the Results section reporting our CCA and PLS results: “Because our results here involve the vast majority of trial-averaged activity variance, we expect that they encompass both components of activity that vary for different movement conditions (condition-dependent), and those that do not (condition-invariant).” To address the general concerns about potential differences in activity components specifically related to muscle activity, we have also added an additional paragraph to the Discussion (see “Manifestations of hierarchy in firing patterns”).

      The idea that local circuit dynamics play a central role in determining the asymmetry between RFA and CFA is not supported by experimental data in this paper. The plausibility of this hypothesis is supported by the model but is not explored in any analyses of the experimental data collected. Given the focus on this idea in the discussion, further experimental investigation is warranted.

      While we do not provide experimental support for this hypothesis, the data we present also do not contradict this hypothesis. Here we used modeling as it is often used - to capture experimental results and generate hypotheses about potential explanation. We do feel that our Discussion makes clear where the hypothesis derives from and does not misrepresent the lack of experimental support. We expect readers will take our engagement with this hypothesis with the appropriate grain of salt. The imaginable experiments to support such a hypothesis would constitute another substantial study, requiring numerous controls - a whole other paper in itself.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors):

      (1) There are a few small text/figure caption modifications that can be made for clarity of reading:

      (2) Unclear sentence in the second paragraph of the introduction: "For example, stimulation applied in PM has been shown to alter the effects on muscles of stimulation in M1 under anesthesia, both in monkeys and rodents."

      This sentence has been rephrased for clarity: “For example, in anesthetized monkeys34 and rodents35, stimulation in PM alters the effects of stimulation in M1 on muscles.”

      (3) The first section of the results presents the optogenetic manipulation. However, the critical control that tests whether this was strictly a local manipulation that did not affect cells in the other region is introduced only much later. It may be helpful to add a comment in this section noting that such a control was performed, even if it is explained in detail later when introducing the recordings.

      We have added the following to the first Results section: “we show below that direct optogenetic effects were only seen in the targeted forelimb area and not the other.”

      (4) Figure 1D - I imagine these averages are from a single animal, but this is not stated in the figure caption.

      “For one example mouse,” has been added to the beginning of the Figure 1D legend.

      (5) Figure 2F - N=6 is not stated in the panel's caption (though it can make it clearer), while it is stated in the caption of 2H.

      “n = 6 mice” has been added to the Figure 2F legend.

      (6) There's some inconsistency with the order of RFA/CFA in the figures, sometimes RFA is presented first (e.g., Figure 1D and 1F), and sometimes CFA is presented first (e.g., panels of Figure 2).

      We do not foresee this leading to confusion.

      (7) "As expected, the majority of recorded neurons in each region exhibited an elevated average firing rate during movement as compared to periods when forelimb muscles were quiescent (Figure 2D,E; Figure S1A,B)" - Figure S1A,B show histograms of narrow vs. wide waveforms, is this the relevant figure here?

      We apologize for the cryptic reference. The waveform width histograms were referred to here because they enabled the separation of narrow- and wide-waveform cells shown in Figure 2D,E. We have added the following clause to the referenced sentence to make this explicit:  “, both for narrow-waveform, putative interneurons and wide-waveform putative pyramidal neurons.”

      (8) Figure 2I caption - "The fraction of activity variance from 150 ms before reach onset to 150 ms after it that occurs before reach onset" - this sentence is not clear.

      The Figure 2I legend has been updated to “The activity variance in the 150 ms before muscle activity onset, defined as a fraction of the total activity variance from 150 ms before to 150 ms after muscle activity onset, for each animal (circles) and the mean across animals (black bars, n = 6 mice).”

      (9) Figure 4B-G - is this showing results across the 6 animals? Not stated clearly.

      Yes - the 21 sessions we had referred to are drawn from all six mice. We have updated the legend here to make this explicit.

      (10) DLAG analysis - is there any particular reasoning behind choosing four across-region and four within-region components?

      In actuality, we completed this analysis for a broad range of component numbers and obtained similar results in all cases. Four fell in the center of our range, and so we focused the illustrations shown in the figure on this value. In general, the number of components is arbitrary. The original paper from Gokcen et al. describes a method for identifying a lower bound on the number of distinct components the method can identify. However, this method yields different results for each individual recording session. For the comparisons we performed, we needed to use the same range of values for each session.

      (11) Figure 5A seems to show 11 across-session components, it's unclear from the caption but I imagine this should show 12 (4 components times 3 sessions?)

      As we state in the Methods, any across-region latent variable with a lag that failed to converge between the boundary values of ±200 ms was removed from the analysis. In the case illustrated in this panel, the lag for one of the components failed to converge and is not shown. We have now clarified this both in the relevant Results paragraph and in the figure legend.

      (12) Figure 5B - is each marker here the average variance explained by all across/within components that were within the specified lag criteria across sessions per mouse? In other words, what does a single marker here stand for?

      We apologize for the lack of clarity here. These values reflect the average across sessions for each mouse. We have updated the legend to make this explicit.

      Reviewer #2 (Recommendations for the authors):

      As I have addressed most of my major recommendations in the public review, I will use this section to include relatively minor points for the authors to consider.

      (1) The EMG data in Figure 1C shows distinct patterns across spouts, both in the magnitude and complexity of muscle activations. It would be interesting to investigate whether these differences in muscle activity lead to behavioral variations (e.g., reaction time, reach duration) and how they relate to the relative involvement of the two areas.

      We agree that it would be interesting to examine how the interactions between areas vary as behavior varies. While the differences between reaches here are limited, we have addressed this question for two substantially different motor behaviors (reaching and climbing) in a follow-up study that was recently preprinted (Kristl et al., biorxiv, 2025).

      (2) How do the authors account for the lingering impact of RFA inactivation on muscle activity, which persists for tens of milliseconds after laser offset? Could this effect be due to compensatory motor activity following the perturbation? A further illustration of how the raw limb trajectories and/or muscle activity are perturbed and recovered would help readers better understand the impact of motor cortical inactivation.

      To clarify the effects of inactivation on a longer timescale, we have added a new supplemental figure showing the plots from Figure 1D over a longer time window extending to 500 ms after trial onset (new Figure S1). Lingering effects do persist, at least in certain cases. In general, we find it hard to ascertain the source of optogenetic effects on longer timescales like this. On the shortest timescales, effects will be mediated by relatively direct connections between regions. However, on these longer timescales, effects could be due to broader changes in brain and behavioral state that can influence muscle activity. For example, attempts to compensate for the initial disturbance to muscle activity could cause divergence from controls on these longer timescales. Muscle tissue itself is also known to have long timescale relaxation dynamics, and it would not be surprising if the relevant control circuits here also had long timescales dynamics, such that we would not expect an immediate return to control when the light pulse ends. Because of this ambiguity, we generally avoid interpretation of optogenetic effects on these longer timescales.

      Reviewer #3 (Recommendations for the authors):

      (1) Page 9: ". We measured the time at which the activity state deviated from baseline preceding reach onset," - I cannot find how this deviation was defined (neither the baseline nor the threshold).

      We have added text to the Figure 2G legend that explicitly states how the baseline and activity onset time were defined.

      (2) Given the shape of the curves in Figure 2G, the significance of this result seems susceptible to slight modifications of what defines a baseline or a deviation threshold. For example, it looks like the circle for CFA has a higher y-axis value, suggesting the baseline deviance is higher, but it is unclear why that would be from the plot. If the threshold for deviation in neural activity state were held uniform between CFA and RFA is the difference still significant across animals?

      We have repeated the analysis using the same absolute threshold for each region. We used the higher of the two thresholds from each region. The difference remains significant. This is now described in the last paragraph of the Results section for Figure 2.

      (3) Since summed deviation of the top 3 PCs is used to show a difference in activity onset between CFA/RFA, but only a small proportion of variance is explained pre-movement (<2% in most animals), it seems relevant to understand what percentage of CFA/RFA neuron activity actually is modulated and deviates from baseline prior to movement and to show the distribution of activity onsets at the single neuron level in CFA/RFA. Can an onset difference only be observed using PCA? 

      Because many neurons have low firing rates, estimating the time at which their firing rate begins to rise near reach onset is difficult to do reliably. It is also true that not all neurons show an increase around onset - some show a decrease and others show no discernible change. Using PCs to measure onset avoids both of these problems, since they capture both increases and decreases in individual neuron firing rates and are much less noisy than individual neuron firing rates. 

      However, based on this comment, we have repeated this analysis on a single-neuron level using only neurons with relatively high average firing rates. Specifically, we analyzed neurons with mean firing rates above the 90th percentile across all sessions within an animal. Neurons whose activity never crossed threshold were excluded. Results matched those using PCs, with RFA neurons showing an earlier average activity onset time. This is now described in the last paragraph of the Results section for Figure 2.

      (4) It is stated that to study the impact of inactivation on CFA/RFA activity, only the 50 highest average firing rate neurons were used (and maybe elsewhere too, e.g., convergent cross mapping). It is unclear why this subselection is necessary. It is justified by stating that higher firing rate neurons have better firing rate estimates. This may be supportable for very low firing rate units that spike sorting tools have a hard time tracking, but I don't think this is supported by data for most of the distribution of firing rates. It therefore seems like the results might be biased by a subselection of certain high firing rate neuron populations. It would be useful to also compute and mention if the results for all neurons/neuron pairs are the same. If there is worry about low-quality units being those with low firing rates, a threshold for firing rate as used elsewhere in the paper (at least 1 spike / 2 trials) seems justified.

      The issue here is that as firing rates decrease and firing rate estimates get noisier, estimates of the change in firing rate get more variable. Here we are trying to estimate the fraction of neurons for which firing rates decreased upon inactivation of the other region. Variability in estimates of the firing rate change will bias this estimate toward 50%, since in the limit when the change estimates are entirely based on noise, we expect 50% to be decreases. As expected, when we use increasingly liberal thresholds for this analysis, the fraction of decreases trends closer to 50%. 

      As a consequence of this, we cannot easily distinguish whether higher firing rate neurons might for some reason have a greater tendency to exhibit decreases in firing compared to lower firing rate neurons. However, we see no positive reason to expect such a difference. We have added a sentence noting this caveat in interpreting our findings to the relevant paragraph of the Results.

      The lack of min/max axis values in Figure 3B-F makes it hard to interpret - are these neurons almost silent when near the bottom of the plot or are they still firing a substantial # of spikes?

      To aid interpretation of the relative magnitude of firing rate changes, we have added minimum firing rates for the averages depicted in Figure 3B,C,E and F to the legend. Our original thinking was that the plots in Figure 3G and H would provide an indication of the relative changes in firing.

      It would be interesting to know if the impact of optogenetic stimulation changed with exposure to the manipulation. Are all results presented only from the first X number of sessions in each animal? Or is the effect robust over time and (within the same animal) you can get the same results of optogenetic inactivation over time? This information seems critical for reproducibility.

      We have now performed brief optogenetic inactivations in several brain areas in several different behavioral paradigms, and have found that inactivation effects are stable both within and across sessions, almost surprisingly so. This includes cases where the inactivations were more frequent (every ~1.25 s on average) and more numerous (>15,000 trials per animal) than in the present manuscript. Thus we did not restrict our analysis here to the first X sessions or trials within a session. We have added additional plots as Figure S3T-AA showing the stability of optogenetic effects both within and across sessions.

      Given that it can be difficult to record from interneurons (as the proportion of putative interneurons in Figure S1 attests), the SALT analyses would be more convincing if a few recordings had been performed in the same region as optogenetic stimulation to show a "positive control" of what direct interneuron stimulation looks like. Could also use this to validate the narrow/wide waveform classification.

      We have verified that using SALT as we have in the present manuscript does detect vGAT+ interneurons directly responding to light. This is included in a recent preprint from the lab (Kristl et al., biorxiv, 2025). We (Warriner et al., Cell Reports, 2022) and others (Guo et al., Neuron, 2014) have previously used direct ChR2 activation to validate waveform-based classification.

      Simultaneous CFA/RFA recordings during optogenetic perturbation would also allow for time courses of inhibition to be compared in RFA/CFA. Does it take 25ms to inhibit locally, and the cross-area impact is fast, or does it inactivate very fast locally and takes ~25ms to impact the other region?

      Latencies of this sort are difficult to precisely measure given the statistical limits of this sort of data, but there does appear to be some degree of delay between local and downstream effects. We do not have a statistical foundation as of yet for concluding that this is the case. It will be interesting to examine this issue more rigorously in the future.

      Given the difference in the analytical methods, the authors should share data in a relatively unprocessed format (e.g., spike times from sorted units relative to video tracking + behavioral data), along with analysis code, to allow others to investigate these differences.

      We plan to post the data and code to our lab’s Github site once the Version of Record is online.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      In this paper, the authors reveal that the MK2 inhibitor CMPD1 can inhibit the growth, migration, and invasion of breast cancer cells both in vitro and in vivo by inducing microtubule depolymerization, preferentially at the microtubule plus-end, leading to cell division arrest, mitotic defects, and apoptotic cell death. They also showed that CMPD1 treatment upregulates genes associated with cell migration and cell death, and downregulates genes related to mitosis and chromosome segregation in breast cancer cells, suggesting a potential mechanism of CMPD1 inhibition in breast cancer. Besides, they used the combination of an MK2-specific inhibitor, MK2-IN-3, with the microtubule depolymerizer vinblastine to simultaneously disrupt both the MK2 signaling pathway and microtubule dynamics, and they claim that inhibiting the p38-MK2 pathway may help to enhance the efficacy of MTAs in the treatment of breast cancer. However, there are a few concerns, including:

      (1) What is the effect of CMPD1 on breast cancer metastasis?

      In this study, we hypothesized that the MK2 signaling pathway could synergize with microtubule-targeting agents (MTAs) to enhance anti-cancer efficacy. We utilized CMPD1 as a potent dual-function inhibitor, targeting both MK2 and microtubule dynamics. By simultaneously inhibiting these pathways, CMPD1 not only shows the therapeutic impact of MTAs, but also significantly suppresses breast cancer cell migration and invasion. Therefore, we propose that CMPD1, through its dual inhibition of MK2 activity and microtubule dynamics, may offer enhanced specificity and efficacy in preventing breast cancer metastasis and limiting tumor progression.

      (2) The mechanism is lacking as to how MK2 inhibitors enhance the efficacy of MTAs.

      Thank you for the valuable suggestion. We agree that our current findings do not fully elucidate the underlying mechanism by which MK2 inhibition synergistically enhances the efficacy of MTAs. We recognize this as an important area for further investigation and are committed to exploring the molecular interplay between MK2 signaling and microtubule dynamics in future studies. A deeper mechanistic understanding will be critical to establishing a strong rationale for the potential co-treatment of MK2 inhibitors and MTAs in clinical breast cancer therapy.

      Reviewer #2 (Public review):

      Summary:

      This study explores the potential of inhibiting the p38-MK2 signaling pathway to enhance the efficacy of microtubule-targeting agents (MTAs) in breast cancer treatment using a dual-target inhibitor.

      Strengths:

      The study identifies the p38-MK2 pathway as a promising target to enhance the efficacy of microtubule-targeting agents (MTAs), offering a novel therapeutic strategy for breast cancer treatment. In addition, the study employs a wide range of techniques, especially live-cell imaging, to assess the microtubule dynamics in TNBC cells.

      We sincerely appreciate your recognition of the significance and impact of our work.

      Weaknesses:

      The study primarily uses RPE1 cells as the control for normal cells, which may not fully capture the response of normal mammary epithelial cells. While CMPD1 is shown to be effective in suppressing tumor growth in MDA-MB-231 xenograft, the study lacks detailed toxicity data to confirm its safety profile in vivo.

      Thank you for your valuable suggestions. In the revised manuscript, we have included CMPD1 treatment in MCF10A cells, a more appropriate non-transformed control line commonly used in breast cancer research. Notably, MCF10A cells exhibited results similar to those observed in RPE1 cells, further reinforcing our conclusion that breast cancer cells display increased sensitivity to CMPD1 treatment. These new findings are presented in Figure 2-Supplement 1A-C. Additionally, we performed further xenograft experiments using CAL-51 and MDA-MB-231 cells. We collected data on tumor growth, mouse body weight, survival rates, and other relevant parameters to comprehensively assess toxicity. The newly obtained results are presented in Figure 3F-G and Figure 3-Supplement 1-3.

      Reviewer #3 (Public review):

      Summary:

      The authors demonstrated MK2i could enhance the therapeutic efficacy of MTAs. With Tumor xenograft and migration assay, the author suggested that the p38-MK2 pathway may serve as a promising therapeutic target in combination with MTAs in cancer treatment.

      Strengths:

      The authors provided a potential treatment for breast cancer.

      Thank you for recognizing the importance and significance of our work.

      Weaknesses:

      (1) In Figure 2, the authors used a human retinal pigment epithelial-1 (RPE1) cell line to show that breast cancer cells are more sensitive to CMPD1 treatment. MCF10A cells would be suggested here as a suitable control. Besides, to compare the sensitivity, IC50 indifferent cell lines should be measured.

      In the revised manuscript, we have addressed these points by determining the IC50 values for CMPD1 in MDA-MB-231, CAL-51, MCF10A, and CAL-51 p53 knockout cells. These new results are presented in Figure 2-Supplement Figure 3.

      (2) The data of MDA-MB-231 in Figure 1D is not consistent with CAL-51 and T47D, also not consistent with the data in Figures 2B-C.

      In the revised manuscript, we have included all relevant statistical analyses in Figure 1D. In MDA-MB-231 cells, there are no statistically significant differences in mitotic duration between 1 µM and 5 µM, 5 µM and 10 µM, or 1 µM and 10 µM CMPD1 treatments. Similarly, no significant differences are observed between 1 µM and 5 µM or 5 µM and 10 µM CMPD1 treatments in CAL-51 cells, and between 5 µM and 10 µM in T-47D cells. These results suggest that mitotic duration does not exhibit a clear dose-dependent relationship within the 1–10 µM range, likely because mitotic arrest has reached a near-plateau effect at these concentrations.

      It is also important to note that the experimental conditions in Figures 1 and 2 are fundamentally different. Figure 1 investigates the effects of higher concentrations of CMPD1 (≥1 µM), which severely disrupt microtubule organization and result in robust mitotic arrest, with cells arrested in mitosis for over 8 hours. In contrast, the conditions in Figure 2 utilize much lower concentrations of CMPD1 (10–50 nM), which are insufficient to cause complete microtubule depolymerization, but are capable of inducing a subtle yet statistically significant mitotic delay, particularly in breast cancer cell lines. These lower concentrations were chosen to mimic clinically relevant intratumoral drug levels. Previous studies have reported that paclitaxel (PTX) concentrations in patient tumors approximate ~50 nM when modeled in vitro. At these physiologically relevant levels, PTX does not induce strong mitotic arrest but instead causes moderate delays that result in division errors and chromosomal instability, ultimately contributing to cancer cell death. In this study, the conditions used in Figure 2 emulate these clinically relevant concentrations for CMPD1. We found that, similar to PTX, low-dose CMPD1 induces a slight but significant mitotic delay without triggering a full mitotic arrest. Notably, unlike PTX, CMPD1 appears to exert this effect selectively in breast cancer cells, contributing to mitotic errors and potentially enhancing therapeutic efficacy through targeted chromosomal instability.

      (3) To support the authors' conclusion in Figure 5, an additional animal experiment performed by tail vein injection would be helpful.

      While current technical limitations have precluded us from conducting this suggested experiment in this study, we have performed complementary xenograft studies using CAL-51 cells treated with CMPD1. These experiments included a comprehensive toxicity analysis. Furthermore, we carried out an in vitro migration assay using CAL-51 cells under combined treatment with the MK2 inhibitor and vinblastine. These additional findings are presented in Figure 3–Supplement 1–3 and Figure 6–Supplement 3. We recognize the importance of the suggested tail vein injection approach and are actively pursuing further mechanistic studies, including this experiment, in our ongoing and future work.

      (4) Page 14, to evaluate the combination result of MK2i and vinblastine, an in vivo animal assay must be performed.

      We appreciate the reviewer’s valuable suggestion. We are actively investigating the synergistic mechanisms between the MK2 inhibitor and microtubule-targeting agents (MTAs). In future studies, we plan to extend our findings by conducting xenograft experiments to further evaluate their therapeutic potential in vivo.

      (5) The authors used RNA-seq to show some pathways affected by CMPD1. What are the key/top genes that were affected? How about the mechanism?

      In the revised manuscript, we have included the top 20 upregulated and downregulated genes identified from RNA-seq analysis using MDA-MB-231 cells. This new data is presented in Figure 6-Supplement Figure 4. Gene Ontology (GO) Biological Process (BP) pathway enrichment analysis revealed that the most significantly enriched pathways among upregulated genes are associated with cell migration, whereas the downregulated genes are primarily involved in mitosis and chromosome segregation. These transcriptional changes are consistent with the phenotypic outcomes observed in our experiments, supporting the functional relevance of CMPD1 treatment. However, further investigation will be necessary to elucidate the detailed molecular mechanisms underlying these effects.

      (6) Line 127, more experiments should be involved to support the conclusion.

      In the revised manuscript, we have addressed this point by performing additional experiments, including determination of the IC₅₀ values of CMPD1 in MDA-MB-231, CAL-51, MCF10A, and CAL-51 p53 knockout cells. We also conducted live-cell imaging analyses using MCF10A cells. These new results further reinforce our conclusion that breast cancer cells are more sensitive to CMPD1 treatment than normal breast epithelial cells, and that this sensitivity is independent of p53 status. The new data are presented in Figure 2-Supplement Figures 1 and 3.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 1D: As the concentration of CMPD1 increased, the mitotic duration of MDA-MB-231 cells decreased, why was that?

      Although there appears to be a slight decrease in mitotic duration with increasing concentrations of CMPD1, our quantitative analysis reveals no statistically significant differences among the 1 to 10 µM treatment groups in MDA-MB-231 cells. In the revised manuscript, we have included all relevant statistical analyses in Figure 1D for clarity. Importantly, all CMPD1-treated groups exhibit a pronounced and statistically significant prolongation of mitosis compared to the DMSO-treated control. While the average mitotic duration in control cells is approximately 30 minutes, cells exposed to 1–10 µM CMPD1 consistently display mitotic durations exceeding 8 hours, indicating a strong and sustained mitotic arrest across this concentration range.

      Reviewer #2 (Recommendations for the authors):

      (1) The rationale for using RPE1 as normal cell control instead of normal mammary epithelial cells as control is unclear. Using normal mammary epithelial cells such as MCF10A for the study is recommended.

      Thank you for this valuable suggestion. In the revised manuscript, we have included additional experiments using non-transformed mammary epithelial MCF10A cells. The new data, presented in Figure 2-Supplement Figures 1 and 3, include both IC50 measurements and live-cell imaging analyses. These results further support our conclusion that breast cancer cells are significantly more sensitive to CMPD1 treatment compared to normal mammary epithelial cells.

      (2) It is intriguing that CAL-51 cells are more sensitive to CMPD1 than MDA-MB-231 cells; examining how p53 signaling changes in these cells would be worthwhile.

      We appreciate this insightful comment. In the revised manuscript, we have measured the IC₅₀ values for both CAL-51 and CAL-51 p53 knockout (p53KO) cells. The results show no significant difference in CMPD1 sensitivity between the two, suggesting that the enhanced sensitivity of CAL-51 cells is independent of p53 status. These new findings are presented in Figure 2—Supplement Figure 3.

      (3) Figures S1A and B are not described and cited in the main text.

      We apologize for this oversight. In the revised manuscript, we have correctly cited and described Figures S1A and B (Figure 2-Supplement Figure 2 A-B in revised manuscript) in the main text.

      (4) I'm not that convinced by the conclusion made from Lines 201-204. First, Figure S2C, which is the growth of tumor volume, does not reflect the toxicity of the drug treatment. No additional data evaluating the toxicity (such as body weight change) under the regimen was shown. Second, although the tumor weight by the endpoint indicated some anti-tumor effect in the MDA-MB-231 xenograft model, the tumor volume does not show the same pattern (the dot lines do not well distinguish which group from which). I would suggest repeating the in vivo experiment using CAL-51 cells since it is more sensitive to CMPD1 according to the previous data.

      Thank you for this thoughtful and constructive feedback. In the revised manuscript, we have addressed these concerns through several additional experiments. We performed new xenograft studies using CAL-51 TNBC cells, in parallel with further toxicity-focused analyses in the MDA-MB-231 model. Consistent with previous results, CMPD1 treatment significantly suppressed tumor growth in CAL-51 xenografts (Figure 3F-G), further supporting its efficacy in a more sensitive cell line. To evaluate drug-associated toxicity, we measured body weight changes throughout the course of treatment. CMPD1-treated mice maintained a comparable weight gain to the control group, whereas mice treated with paclitaxel (PTX) showed significantly reduced body weight (Figure 3-Supplement Figure 2A). Notably, animal deaths occurred only in the PTX-treated groups in both MDA-MB-231 and CAL-51 models (Figure 3-Supplement Figure 2B). We also assessed organ toxicity, including both anatomical and functional evaluations of the kidney and liver, and observed no significant damage in CMPD1-treated mice (Figure 3-Supplement Figures 3A-B and 3D). Furthermore, white blood cell (WBC) counts remained stable in the CMPD1 group, while PTX treatment led to a significant reduction (Figure 3-Supplement Figures 3C-D). These additional data provide strong evidence for the anti-tumor efficacy and lower toxicity of CMPD1 in vivo.

      (5) While I appreciate the combination effect of treating cells with the MK2 inhibitor with vinblastine. I would consider using genetic knockdown as a complementary approach to demonstrate that inhibiting the p38-MK2 pathway synergized with microtubule depolymerizing agents. In addition, could inhibition of the p38-MK2 pathway alone induce the cell growth inhibition observed with CMPD1 treatment?

      Thank you for these important suggestions. In the revised manuscript, we have incorporated siRNA-mediated knockdown of MK2 in combination with vinblastine treatment. This genetic approach revealed synergistic effects on mitotic index and mitotic errors, closely mirroring the phenotypes observed with pharmacological co-treatment using the MK2 inhibitor and vinblastine (Figure 6-Supplement Figure 2A-C). These results further validate the role of the p38-MK2 pathway in modulating mitotic progression in the presence of MTAs. To address whether MK2 inhibition alone is sufficient to impair cell growth, we performed validation experiments using the MK2 inhibitor at 10 µM. At this concentration, the inhibitor effectively blocked phosphorylation of Hsp27, a major downstream substrate of MK2, under H2O2-induced ROS stress conditions (Figure 6-Supplement Figure 1A-B), confirming MK2 signaling pathway inhibition. However, treatment with the MK2 inhibitor alone did not significantly affect cell proliferation, as shown by a 4-day growth curve analysis in CAL-51 cells (Figure 6-Supplement Figure 1C). These findings suggest that inhibition of the p38-MK2 pathway alone is not sufficient to suppress cancer cell growth, and that its synergistic interaction with MTAs, such as vinblastine, is essential for the observed anti-proliferative effects.

      (6) Phenotypic studies (such as anchorage-independent growth and cell migration and invasion assay) of combining MK2 inhibitor with vinblastine in TNBC cells are recommended.

      Thank you for this valuable suggestion. In the revised manuscript, we have conducted cancer cell migration assays using CAL-51 TNBC cells treated with control, MK2 inhibitor alone, vinblastine alone, or the combination of both. Our results demonstrate that the combination treatment significantly enhances the inhibition of cell migration compared to either agent alone (Figure 6-Supplement Figure 3A-C). These findings provide additional phenotypic evidence supporting the synergistic interaction between MK2 inhibition and microtubule-targeting agents in TNBC cells.

      Reviewer #3 (Recommendations for the authors):

      The authors can utilize diverse experiments to support their conclusions.

      Thank you for this important suggestion. In the revised manuscript, we have conducted a series of additional experiments to robustly support our conclusions.

      These include:

      (1) Xenograft studies using CAL-51 TNBC cells, along with comprehensive toxicity evaluations.

      (2) CMPD1 sensitivity analysis in non-transformed MCF10A mammary epithelial cells.

      (3) IC50 measurements in MDA-MB-231, CAL-51, CAL-51 p53 knockout, and MCF10A cells.

      (4) Cell migration assays assessing the combination effects of MK2 inhibitor and vinblastine

      (5) siRNA-mediated genetic knockdown of MK2 to complement pharmacological findings

      Collectively, these additional data sets substantially strengthen the evidence base for our conclusions and provide a more comprehensive mechanistic understanding.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors use the teleost medaka as an animal model to study the effect of seasonal changes in day-length on feeding behaviour and oocyte production. They report a careful analysis of how day-length affects female medakas and a thorough molecular genetic analysis of genes potentially involved in this process. They show a detailed analysis of two genes and include a mutant analysis of one gene to support their conclusions

      Strengths:

      The authors pick their animal model well and exploit the possibilities to examine in this laboratory model the effect of a key environmental influence, namely the seasonal changes of day-length. The phenotypic changes are carefully analysed and well-controlled. The mutational analysis of the agrp1 by a ko-mutant provides important evidence to support the conclusions. Thus this report exceeds previous findings on the function of agrp1 and npyb as regulators of food-intake and shows how in medaka these genes are involved in regulating the organismal response to an environmental change. It thus furthers our understanding of how animals react to key exogenous stimuli for adaptation.

      Weaknesses:

      The authors are too modest when it comes to underscoring the importance of their findings. Previous animal models used to study the effect of these neuropeptides on feeding behaviour have either lost or were most likely never sensitive to seasonal changes of day length. Considering the key importance of this parameter on many aspects of plant and animal life it could be better emphasised that a suitable animal model is at hand that permits this. The molecular characterization of the agrp1 ko-mutant that the authors have generated lacks some details that would help to appreciate the validity of the mutant phenotype. Additional data would help in this respect.

      We would like to thank Reviewer #1 for the really constructive advice. In the revised manuscript, we provided more information on the molecular characterization of the agrp1 KO-mutant and to emphasize the importance of our present animal model that permits the analysis of neuropeptide effects on feeding behavior in response to seasonal changes of day length.

      Reviewer #2 (Public review):

      Summary:

      The authors investigated the mechanisms behind breeding season-dependent feeding behavior using medaka, a well-known photoperiodic species, as a model. Through a combination of molecular, cellular, and behavioral analyses, including tests with mutants, they concluded that AgRP1 plays a central role in feeding behavior, mediated by ovarian estrogenic signals.

      Strengths:

      This study offers valuable insights into the neuroendocrine mechanisms that govern breeding season-dependent feeding behavior in medaka. The multidisciplinary approach, which includes molecular and physiological analyses, enhances the scientific contribution of the research.

      Weaknesses:

      While medaka is an appropriate model for studying seasonal breeding, the results presented are insufficient to fully support the authors' conclusions.

      Specifically, methods and data analyses are incomplete in justifying the primary claims:<br /> - the procedure for the food intake assay is unclear;

      - the sample size is very small;

      - the statistical analysis is not always adequate.

      Additionally, the discussion fails to consider the possible role of other hormones that may be involved in the feeding mechanism.

      We would like to thank Reviewer #2 for the helpful comments. As the reviewer suggested, we revised the paragraph describing the procedure for the food intake assay to make it much easier for the readers to understand in the revised manuscript. In Figure 1-Supplementary figure 2, RNAseq was performed to search for the candidate neuropeptides, and that’s why the sample size was the minimum. On the other hand, each group in the other experiments consist of n ≥ 5 samples, which is usually accepted to be adequate sample size in various studies (cf. Kanda et al., Gen Comp Endocrinol., 2011, Spicer et al., Biol Reprod., 2017). As for the statistical analyses, we revised our manuscript so that the readers may be convinced with the validity of our statistical analyses.

      Reviewer #3 (Public review):

      Summary:

      Understanding the mechanisms whereby animals restrict the timing of their reproduction according to day length is a critical challenge given that many of the most relevant species for agriculture are strongly photoperiodic. However, the principal animal models capable of detailed genetic analysis do not respond to photoperiod so this has inevitably limited progress in this field. The fish model medaka occupies a uniquely powerful position since its reproduction is strictly restricted to long days and it also offers a wide range of genetic tools for exploring, in depth, various molecular and cellular control mechanisms.

      For these reasons, this manuscript by Tagui and colleagues is particularly valuable. It uses the medaka to explore links bridging photoperiod, feeding behaviour, and reproduction. The authors demonstrate that in female, but not male medaka, photoperiod-induced reproduction is associated with an increase in feeding, presumably explained by the high metabolic cost of producing eggs on a daily basis during the reproductive period. Using RNAseq analysis of the brain, they reveal that the expression of the neuropeptides agrp and npy that have been previously implicated in the regulation of feeding behaviour in mice are upregulated in the medaka brain during exposure to long photoperiod conditions. Unlike the situation in mice, these two neuropeptides are not co-expressed in medaka neurons, and food deprivation in medaka led to increases in agrp but also a decrease in npy expression. Furthermore, the situation in fish may be more complicated than in mice due to the presence of multiple gene paralogs for each neuropeptide. Exposure to long-day conditions increases agrp1 expression in medaka as the result of increases in the number of neurons expressing this neuropeptide, while the increase in npyb levels results from increased levels of expression in the same population of cells. Using ovariectomized medaka and in situ hybridization assays, the authors reveal that the regulation of agrp1 involves estrogen acting via the estrogen receptor esr2a. Finally, a loss of agrp1 function mutant is generated where the female mutants fail to show the characteristic increase in feeding associated with long-day enhanced reproduction as well as yielding reduced numbers of eggs during spawning.

      Strengths:

      This manuscript provides important foundational work for future investigations aiming to elucidate the coordination of photoperiod sensing, feeding activity, and reproduction function. The authors have used a combination of approaches with a genetic model that is particularly well suited to studying photoperiodic-dependent physiology and behaviour. The data are clear and the results are convincing and support the main conclusions drawn. The findings are relevant not only for understanding photopriodic responses but also provide more general insight into links between reproduction and feeding behaviour control.

      Weaknesses:

      Some experimental models used in this study, namely ovariectomized female fish and juvenile fish have not been analysed in terms of their feeding behaviour and so do not give a complete view of the position of this feeding regulatory mechanism in the context of reproduction status. Furthermore, the scope of the discussion section should be expanded to speculate on the functional significance of linking feeding behaviour control with reproductive function.

      We would like to thank Reviewer #3 for the insightful advice. We added several pertinent sentences describing the ovariectomized female fish and juvenile fish, and our revised manuscript will give more complete view of their feeding regulatory mechanism in the context of reproduction status. In addition, we revised the discussion section to incorporate the valuable suggestion of the Reviewer #3.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      General: the text could profit from a careful editing of errors, including adjusting singular and plural status of nouns and verbs: examples are line 107 noun, line 96 verb suitable text editing software is available to do this task

      Thank you for your suggestion. We thoroughly read the entire manuscript and corrected such errors in the revised manuscript.

      As medaka is a unique genetic vertebrate model to study seasonal effects, it would be interesting to know whether the authors found novel or rather unexpected genes with a differential expression between LD and SD. It is understandable that the authors focused on argrp1 and npyb, as these have already been well studied in mammalian models although not in this context. Novel insights with genes previously not implicated in feeding regulation could underscore the unique nature of medaka as a model.

      We appreciate your kind comments, which we found really encouraging to us. Since we focused on feeding-related peptides, we did not find any novel genes that have not been reported.

      ISH is unreliable as a methodology to quantify expression levels. Yet the authors use this to compare fed and starved females to compare expression levels of agrp1. They use a temporal staining comparison and compare 90-minute and 300-minute staining reactions. However, they do not explain why they use the 90-minute staining time point and why 300 minutes of staining is the "saturation point of staining". They should provide compelling data for their claim and the selection of time points or else refrain from using these (at best) semi-quantitative ISH and provide more detailed (using serial sections) data to quantify the number of expressing cells.

      Anyhow, the quantification of mRNA expression levels may not be that significant when trying to compare different states of gene function, as translational and post-translational steps can have large effects on gene function. This should be discussed adequately.

      Thank you very much for your comments. We conducted ISH by using medaka under LD or SD, not using those under fed or starved conditions. In addition, our previous study demonstrated that the slopes of the increase in the number of cells stained by ISH are also different if there is a difference in the expression level (Mitani et al., 2010). Although we do not have quantitative data of cell numbers, we confirmed that the number of cells expressing agrp1 was saturated around 300 mins in our preliminary experiments, and therefore we terminated the chemogenic reactions at 300 mins. Based on these, we compared the cell ratio of 90 min (beginning of coloring) /300 min (saturation). However, since this analysis may not be worth discussing in detail, we moved this part to the supplementary figure as the reviewer suggested.

      The molecular characterization of the agrp1 ko mutant is a bit thin.

      Line 221: "We obtained agrp1<sup>−/−</sup> medaka, which has lots of amino acid changes in functional site for AgRP1" is a bit vague as a description for the ko-mutation. It would be really helpful if the authors could provide a scheme showing the wt protein with the relevant functional sites alongside the presumptive mutant protein.

      How did the authors verify the molecular nature of their mutation? They should use suitable antibodies and western-blot analysis (maybe reagents from Shainer et al., 2019 work in medaka); in case this is not possible they could isolate & clone the mutant transcript and use in-vitro translation systems to show that the presumptive mutant protein can actually be translated from this transcript. Another strategy could be to use a second non-allelic and (hopefully) non-complementing mutation (ko1/ko2 heterozygots for example) to show that ko-mutation acts the way the authors presume. The authors mention agrp1 ko medaka lines (plural!) in line 520, thus they may have an additional ko allele at hand.

      Thank you very much for your comments. We explained the mutation site in Figure 6-Supplementary Figure 1 (A: DNA sequences and B: predicted amino acid sequence, of WT and mutants). In addition, we added immunohistochemistry data of WT and mutant using anti-AgRP antibody (Figure 6-Supplementary Figure 1C). While AgRP-immunoreactive signals were observed in WT, those were not in agrp1<sup>−/−</sup>. This result suggests that AgRP1 is not functional in agrp1<sup>−/−</sup>.

      Presumably, the authors analysed heterozygous agrp1<sup>+/−</sup> females and found they are as wt. If so the authors should say so.

      Yes, we analyzed food intake of agrp1<sup>+/−</sup>. We added a supplementary figure (Figure 6-Supplementary Figure 2) and a sentence in L. 233-234.

      How about agrp1<sup>−/−</sup> medaka males: do they show a discernible phenotype?

      We analyzed the phenotypes of agrp1<sup>−/−</sup> males but did not describe the results, since the present paper only focused on female-specific feeding behavior.

      agrp1<sup>−/−</sup> females show no significant sensitivity of food intake to day length (Figure 6C). Does their (reduced) oocyte production react to day length? With other words: how much of the seasonal sensitivity is left in agrp1<sup>−/−</sup> females. The authors suggest that E2 acts upstream of agrp1 and therefore some seasonality may still be left in agrp1<sup>−/−</sup> females.

      Although agrp1<sup>−/−</sup> female is suggested to display abnormal seasonality of food intake, agrp1<sup>−/−</sup> female in LD spawns and that in SD does not, indicating that seasonality of gonadal maturation still remains in agrp1<sup>−/−</sup> female.

      The authors show that fshb and lhb are downregulated in agrp1<sup>−/−</sup> females. Is this also the case in wt females at SD?

      Thank you very much for your comment. As described above, agrp1<sup>−/−</sup> can spawn, which indicates that mechanisms for the downregulation of gonadotropins in agrp1<sup>−/−</sup> may be different from that in SD female.

      Figure 1_Supplementary Figure 2: the trends are visible in B and C, however, there is quite some variance between LD1, 2, and 3; the same for SD 1, 2, and 3. Can the authors give an explanation for this?

      Since the data for LD1, 2, and 3 (SD1, 2, and 3) were obtained from different individual fish, the variance may be reasonable. We conducted expression analyses by using RNA-seq to find candidate genes that show larger differences than individual ones.

      Figure 7E: the ovaries are difficult to see and the size bar in the wt picture is missing.

      Thank you very much for your comments. We added a scale bar in the wt picture.

      509 ff: the authors do not describe what exactly the "sham operation" encompasses: were the females just anesthetised or was there an actual operation without removing the ovaries?

      The sham operation group was anesthetized, received an abdominal incision without removing the ovaries, and received skin suture by using a silk thread. We added this explanation in the Method section.

      519 ff: was the agrp1<sup>−/−</sup> ko induced in the d-rR strain to have the same genetic background as the wt fish?

      Exactly. As the reviewer pointed out, the genetic background of agrp1 -/- was the same as that of WT.

      Minor points (Text edits):

      Line 42: change "when" into "where".

      Line: 54 "under the fixed appropriate ambient temperature" change into "while keeping an appropriate temperature constant".

      Line 55: here it would be good to briefly explain what long-day and short-day is so that the reader has an idea about the changes required without having to scroll down to the M&M section. For example LD 14/10 light-dark cycle, SD 10/14 light-dark cycle.

      Line 88: change "measurement" into "measuring".

      Line 96 change eats -> eat.

      Line 107 change female -> females.

      We deeply appreciate the reviewer’s suggestions described above. We corrected them as the reviewer suggested (L. 42, L. 54, L. 55, L. 89, L. 96, L. 107).

      Line 144-145: the sentence "since hypothalamic npy control..." does not make sense. Please correct.

      Thank you very much for your suggestion. We corrected the sentence so that it makes sense (L. 145-146).

      Line 180 and 185: the term here should be "LD induced sexual activity" rather than maturity. Age is the main determinant of maturity whereas light (LD) determines activity, in other words SD females are sexually mature if they are post-puberty stage.

      Thank you very much for your suggestion. Since the sentence “LD-induced sexual maturity” made the reviewer confused, we corrected the sentence “substance(s) from LD-induced mature ovary” or “ovarian maturity”. Even though SD females are at post-puberty stage, their ovaries are immature and do not possess mature oocytes (L. 181).

      Line 222: the authors should include the relevant information about the females: presumably agrp1.

      In Line 226-228, we explained the phenotypes of agrp1 knockout and added information for AgRP1 protein in Figure 6-Supplementary figure 1C.

      Lines 449 ff: authors should state that the analysis was done in females, instead of just writing "medaka". This is also in line with the preceding paragraph of the M&M section.

      Thank you very much for your suggestions. We corrected the sentence as the reviewer suggested (L.469)

      Line 305: change like other mammals -> like in mammals.

      Thank you very much for your suggestion. We corrected the sentence as the reviewer suggested (L. 320)

      Reviewer #2 (Recommendations for the authors):

      (1) The procedure of the food intake assay is not clear.

      - Habituation Period: Medaka were placed into a white cup containing 100 mL of water and allowed to habituate for 5 minutes. However, is 5 minutes sufficient to reduce stress in the fish? A stressed fish does not exhibit the same feeding behavior as an unstressed one.

      Thank you for your comment. We confirmed that 5 minutes is enough for habituation in medaka, since medaka can swim freely in a few minutes after replacement from the tank and show normal feeding behavior.

      - Feeding Protocol: Medaka were fed with 200 μL aliquots of brine shrimp-containing water. This procedure was repeated multiple times. How many times was this feeding procedure repeated? Was it 3, 10, or 100 times?

      Although there was a small variation in each trial, we usually applied tubes about 5 times or so.

      - Brine Shrimp Counting: You collected 10 mL of the breeding water to count the number of uneaten brine shrimp. Can you confirm that sampling 10% of the total volume is representative? Were any tests conducted to validate this? Given that you developed an automated tool to count the brine shrimp, why didn't you count them in all 100 mL?

      The reason for collecting 10 mL is to collect the leftover shrimp as soon as possible. Ten mins after the start of the experiment, we quickly placed a magnetic bar to stir the breeding water so that the shrimp concentration will be constant. Then we collected 10 mL aliquot from the experimental cup by using a micro pipette. In preliminary trials, we applied shrimps, the amount of which is almost the same as that applied to WT medaka in LD, to a white cup containing 100 mL water, and we divided it into 10 mL and 90 mL aliquots and separately counted the number of shrimps in each aliquot. Here, we confirmed that the variance between the numbers calculated by counting the shrimps in 10 mL aliquot and the total volume of 100 mL falls within the range of the variance of total applied shrimp. Thus, our present counting method can be considered reasonable.

      - Brine Shrimp Aliquot Measurement: You mentioned counting the number of brine shrimp in the 200 μL solution three times before and after the experiments. What does this mean? Did you use this procedure to calculate the mean number of brine shrimp in each 200 μL aliquot?

      Thank you for your comment. As the reviewer commented, to calculate the mean number of brine shrimp in each 200 µL aliquot, we counted the number of brine shrimp in the 200 µL solution three times before and after the experiments.

      - How did you normalize the food intake data? This procedure is not detailed in the methods section.

      Thank you very much for pointing it out. We normalized food intake by subtracting the amount of shrimp by the average of those in LD or WT fish. This explanation was added in the Method section (L. 439).

      (2) Sample Size. Various tests were conducted with a low number of medaka (e.g., 2 brains for RNA-seq, 8 females for ovariectomy). Are these sample sizes sufficient to draw reliable conclusions?

      In Figure 1-Supplementary figure 2, RNAseq was performed to search for the candidate neuropeptides, and that’s why the sample size was the minimum; we pooled two brains as one sample and used three samples per group. On the other hand, each group in the other experiments consist of n ≥ 5 samples, which is usually accepted to be adequate sample size in various studies (cf. Kanda et al., Gen Comp Endocrinol., 2011, Spicer et al., Biol Reprod., 2017).

      (3) Statistical Analysis.

      - The authors used both parametric and non-parametric tests but did not specify how they assessed the normal distribution of the data. For example, if I understood correctly, a t-test was used to compare a small dataset (n=3). In such cases, a U-test would be more appropriate.

      Thank you for your comment. As for Figure 1 -Supplementary Figure 2C, we showed the graphs just to show you candidates. To avoid misunderstanding, we deleted statistical statements in that panel.

      - It is unclear why the Steel-Dwass test was used instead of the Kruskal-Wallis test for comparing agrp1 and npyb expressions in control, OVX, and E2-administered medaka.

      While the authors mentioned using non-parametric tests, they did not specify in which contexts or conditions they were applied.

      Thank you very much for your comment. Kruskal-Wallis test statistically shows whether or not there are differences among any of three groups. To perform multiple comparisons among the three groups, we used Steel-Dwass test.

      - The results section lacks details on the statistical tests used, including the specific test (e.g., Z, U, or W values) and degrees of freedom.

      Thank you for your comment. As the reviewer pointed out, we added such statements in all the figure legends containing statistics.

      (4) Previous studies have shown that photoperiod treatments alter the production of various hormones in medaka (e.g., Lucon-Xiccato et al., 2022; Shimmura et al., 2017), some of which, like growth hormone (GH), have been shown to influence feeding behavior (Canosa et al., 2007).

      In your RNA-seq analysis, did you observe any changes in the expression of genes involved in other hormone synthesis pathways, such as pituitary hormones (GH and TSH), leptin, or ghrelin (e.g., see Volkoff, 2016; Blanco, 2020; Bertolucci et al., 2019)?

      Including such evidence in the discussion would provide a broader perspective on the hormonal regulation of food intake in medaka.

      We appreciate your constructive comments. Unfortunately, since we performed RNA-seq using the whole brain after removal of the pituitary, we could not check such changes in the expression of pituitary hormone-related genes. As additional information about the feeding-related hormones, leptin did not show significant difference in our RNA-seq analysis, and we could not analyze ghrelin because ghrelin has not been annotated in medaka (NCBI and ensembl).

      Reviewer #3 (Recommendations for the authors):

      There are some parts of the study that need to be developed further in order to provide a more comprehensive analysis.

      (1) In the juvenile as well as ovariectomized female fish, the authors should confirm experimentally whether day length influences feeding activity.

      Thank you very much for your suggestion. We analyzed feeding behavior of juvenile (Figure 4-Supplementary Figure 1) and OVX female (Figure 5-Supplementary Figure 1). As shown in these figures, food intake in juvenile and OVX were not significantly different between LD and SD.

      (2) More discussion as to the relevance of increasing feeding activity to support reproductive functions such as sustained egg production would be valuable. One assumes the metabolic costs of producing eggs on a daily basis in this species would inevitably require increased food intake. Is this a reasonable prediction?

      We deeply appreciate your suggestion. We strongly agree with this argument, and we added such discussion in “Discussion” section (L. 406-408).

      Editor's note:

      Should you choose to revise your manuscript, if you have not already done so, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript.

      We appreciate the editor’s suggestion. We added P-value in the main manuscript, where statistical analyses were performed. In addition, we described test statics in the figure legends. We did not use df values for the statistics used in the present analyses, and therefore did not describe it in the main text.

    1. Author response:

      We will revise the statements of novelty in the introduction by more clearly emphasizing how our model addresses gaps in the existing literature. In addition, we will clarify the description of the dispersal process. Briefly, we use the same dispersal gene β to represent the likelihood an individual will either leave or join a group, thereby quantifying both dispersal and immigration using the same parameter. Specifically, individuals with higher β are more likely to remain as floaters (i.e., disperse from their natal group to become a breeder elsewhere), whereas those with lower β are either more likely to remain in their natal group as subordinates (i.e., queue in a group for the breeding position) or join another group if they dispersed. Immigrants that join a group as a subordinate help and queue for a breeding position, as does any natal subordinate born into the group. To follow the suggestion of the referee and more fully explore the impact of competition between subordinates born in the group and subordinate immigrants, we will explore extending our model to allow dispersers to leave their natal group and join another as subordinates, by incorporating a reaction norm based on their age or rank (D = 1 / (1 + exp (β<sub>t</sub> * t – β<sub>0</sub>)) . This approach will allow individuals to adjust also their dispersal strategy to their competitiveness and to avoid kin competition by remaining as a subordinate in another group.

      We apologize that there was some confusion with terminology. We use the term “disperser” to describe individuals that disperse from their natal group. Dispersers can assume one of three roles: (1) they can migrate to another group as "subordinates"; (2) they can join another group as "breeders" if they successfully outcompete other candidates; or (3) they can remain as "floaters" if they fail to join a group. "Floaters" are individuals who persist in a transient state without access to a breeding territory, waiting for opportunities to join a group in an established territory. Therefore, dispersers do not work when they are floaters, but they may later help if they immigrate to a group as a subordinate. Consequently, immigrant subordinates have no inherent competitive advantage over natal subordinates (as step 2.2. “Join a group” is followed by step 3. “Help”, which occurs before step 5. “Become a breeder”). Nevertheless, floaters can potentially outcompete subordinates of the same age if they attempt to breed without first queuing as a subordinate (step 5) when subordinates are engaged in work tasks. We believe that this assumption is realistic and constitutes part of the costs associated with work tasks. However, floaters are at a disadvantage for becoming a breeder because: (1) floaters incur higher mortality than individuals within groups (eq. 3); and (2) floaters may only attempt to become breeders in some breeding cycles (versus subordinate groups members, who are automatically candidates for an open breeding position in the group in each cycle). Therefore, due to their higher mortality, floaters are rarely older than individuals within groups, which heavily influences dominance value and competitiveness. Additionally, any competitive advantage that floaters might have over other subordinate group members is unlikely to drive the kin selection-only results because subordinates would preferably choose defense tasks instead of work tasks so as not to be at a competitive disadvantage compared to floaters.

      We note that reviewers also mention that floaters often aren't usually high resource holding potential (RHP) individuals and, therefore, our assumptions might be unrealistic. As we explain above, floaters are not inherently at a competitive advantage in our model. In any case, empirical work in a number of species has shown that dispersers are not necessarily those of lower RHP or of lower quality. In fact, according to the ecological constraints hypothesis, one might predict that high quality individuals are the ones that disperse because only individuals in good condition (e.g., larger body size, better energy reserves) can afford the costs associated with dispersal (Cote et al., 2022). By adding a reaction norm approach to explore the role of age or rank in the revised version, we can also determine whether higher or lower quality individuals are the ones dispersing. We will address the issues of terminology and clarity of the relative competitive advantage of floaters versus subordinates, and also include more information in the Supplementary Tables (e.g., the number of floaters). As a side note, the “scramble context” we mention was an additional implementation that we decided to remove from the final manuscript, but we forgot to remove from Table 1 before submission.

      The reviewers also raised a question about asexual reproduction and relatedness more generally. As we showed in the Supplementary Tables and the section on relatedness in the SI (“Kin selection and the evolution of division of labor"), high relatedness does not appear to explain our results. In evolutionary biology generally and in game theory specifically (with the exception of models on sexual selection or sex-specific traits), asexual reproduction is often modelled because it reduces unnecessary complexity. To further study the effect of relatedness on kin structures more closely resembling those of vertebrates, however, we will create an additional “relatedness structure level”, where we will shuffle half of the philopatric offspring using the same method used to remove relatedness completely. This approach will effectively reduce relatedness structure by half and overcome the concerns with our decision to model asexual reproduction.

      Briefly, we will elaborate on the concept of division of labor and the tasks that cooperative breeders perform. In nature, multiple tasks are often necessary to successfully rear offspring. For example, in many cooperatively breeding birds, the primary reasons that individuals fail to produce offspring are (1) starvation, which is mitigated by the feeding of offspring, and (2) nest depredation, which is countered by defensive behavior. Consequently, both types of tasks are necessary to successfully produce offspring, and focusing solely on one while neglecting the other is likely to result in lower reproductive success than if both tasks are performed by individuals within the group. We simplify this principle in the model by maximizing reproductive output when both tasks are carried out to a similar extent, allowing for some flexibility from the mean. In response to the reviewer suggestion about making fecundity a function of work tasks and offspring survival as a function of defensive tasks, these are actually equivalent in model terms, as it’s the same whether breeders produce three offspring and two die, or if they only produce one. This represents, of course, a simplification of the natural context, where breeding unsuccessfully is more costly (in terms of time and energy investment) than not breeding at all, but this is approach is typically used in models of this sort.

      The scope of this paper was to study division of labor in cooperatively breeding species with fertile workers, in which help is exclusively directed towards breeders to enhance offspring production (i.e., alloparental care). Our focus is in line with previous work in most other social animals, including eusocial insects and humans, which emphasizes how division of labor maximizes group productivity. Other forms of “general” help are not considered in the paper, and such forms of help are rarely considered in cooperatively breeding vertebrates or in the division of labor literature, as they do not result in task partitioning to enhance productivity.

      How do we model help? Help provided is an interaction between H (total effort) and T (proportion of total effort invested in each type of task). We will make this definition clearer in the revised manuscript. Thank you for pointing out an error in Eq. 1. This inequality was indeed written incorrectly in the paper (but is correct in the model code); it is dominance rank instead of age (see code in Individual.cpp lines 99-119). We will correct this mistake in the revision.

      There was also a question about bounded and unbounded helping costs. The difference in costs is inherent to the nature of the different task (work or defense): while survival is naturally bounded, with death as the lower bound, dominance costs are potentially unbounded, as they are influenced by dynamic social contexts and potential competitors. Therefore, we believe that the model’s cost structure is not too different to that in nature.

      Thank you for your comments about the parameter landscape. It is important to point out that variations in the mutation rate do not qualitatively affect our results, as this is something we explored in previous versions of the model (not shown). Briefly, we find that variations in the mutation rates only alter the time required to reach equilibrium. Increasing the step size of mutation diminishes the strength of selection by adding stochasticity and reducing the genetic correlation between offspring and their parents. Population size could, in theory, affect our results, as small populations are more prone to extinction. Since this was not something we planned to explore in the paper directly, we specifically chose a large population size, or better said, a large number of territories (i.e. 5000) that can potentially host a large population.

      During the exploratory phase of the model development, various parameters and values were also assessed. However, the manuscript only details the ranges of values and parameters where changes in the behaviors of interest were observed, enhancing clarity and conciseness. For instance, variation in y<sub>h</sub> (the cost of help on dominance when performing “work tasks”) led to behavioral changes similar to those caused by changes in x<sub>h</sub> (the cost of help in survival when performing “defensive tasks”), as both are proportional to each other. Specifically, since an increase in defense costs raises the proportion of work relative to defense tasks, while an increase in the costs of work task has the opposite effect, only results for the variation of x<sub>h</sub> were included in the manuscript to avoid redundancy. We will make this clearer in the revision.

      Finally, following the advice from the reviewers, we will add the symbols of the variables to the figure axes, and clarify whether the values shown represent a genetic or phenotypic trait. In Figure 2, the x-axis is H and the y-axis is T. In Figure 3A, the subindex t in x-axis is incorrect; it should be subindex R (reaction norm to dominance rank instead of age), the y-axis is T. In Figure 3B, the x-axis is R, and the y-axis is T. All values of T, H and R are phenotypic expressed values (see Table 1). For instance, T values are the phenotypic expressed values from the individuals in the population according to their genetic gamma values and their current dominance rank at a given time point.

      References

      Cote, J., Dahirel, M., Schtickzelle, N., Altermatt, F., Ansart, A., Blanchet, S., Chaine, A. S., De Laender, F., De Raedt, J., & Haegeman, B. (2022). Dispersal syndromes in challenging environments: A cross‐species experiment. Ecology Letters, 25(12), 2675–2687.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      This manuscript presents a practical modification of the orthogonal hybridization chain reaction (HCR) technique, a promising yet underutilized method with broad potential for future applications across various fields. The authors advance this technique by integrating peptide ligation technology and nanobody-based antibody mimetics - cost-effective and scalable alternatives to conventional antibodies - into a DNA-immunoassay framework that merges oligonucleotide-based detection with immunoassay methodologies. Notably, they demonstrate that this approach facilitates a modified ELISA platform capable of simultaneously quantifying multiple target protein expression levels within a single protein mixture sample.

      Strengths:

      The hybridization chain reaction (HCR) technique was initially developed to enable the simultaneous detection of multiple mRNA expression levels within the same tissue. This method has since evolved into immuno-HCR, which extends its application to protein detection by utilizing antibodies. A key requirement of immuno-HCR is the coupling of oligonucleotides to antibodies, a process that can be challenging due to the inherent difficulties in expressing and purifying conventional antibodies.

      In this study, the authors present an innovative approach that circumvents these limitations by employing nanobody-based antibody mimetics, which recognize antibodies, instead of directly coupling oligonucleotides to conventional antibodies. This strategy facilitates oligonucleotide conjugation - designed to target the initiator hairpin oligonucleotide of HCR -through peptide ligation and click chemistry.

      Weaknesses:

      The sandwich-format technique presented in this study, which employs a nanobody that recognizes primary IgG antibodies, may have limited scalability compared to existing methods that directly couple oligonucleotides to primary antibodies. This limitation arises because the C-region types of primary antibodies are relatively restricted, meaning that the use of nanobody-based detection may constrain the number of target proteins that can be analyzed simultaneously. In contrast, the conventional approach of directly conjugating oligonucleotides to primary antibodies allows for a broader range of protein targets to be analyzed in parallel.

      We would like to clarify that MaMBA was specifically designed to address and overcome the limitations imposed by relying on primary antibodies’ Fc types for multiplexing. MaMBA utilizes DNA oligo-conjugated nanobodies that selectively and monovalently bind to the Fc region of IgG. This key feature allows us to barcode primary IgGs targeting different antigens independently. These barcoded IgGs can then be pooled together after barcoding, effectively minimizing the potential for cross-reactivity or crossover. Therefore, IgGs barcoded using MaMBA are functionally equivalent to those barcoded via conventional direct conjugation approaches with respect to multiplexing capability.

      Additionally, in the context of HCR-based protein detection, the number of proteins that can be analyzed simultaneously is inherently constrained by fluorescence wavelength overlap in microscopy, which limits its multiplexing capability. By comparison, direct coupling of oligonucleotides to primary antibodies can facilitate the simultaneous measurement of a significantly greater number of protein targets than the sandwich-based nanobody approach in the barcode-ELISA/NGS-based technique.

      As we have responded above, MaMBA barcoding of primary IgGs that target various antigens can be conducted separately. Once barcoded, these IgGs can then be combined into a single pool. Therefore, for BLISA (i.e., the barcode-ELISA/NGS-based technique), IgGs barcoded through MaMBA offer the same multiplexing capability as those barcoded using traditional direct conjugation methods.

      In in situ protein imaging, spectral overlap can indeed limit the throughput of multiplexed HCR fluorescent imaging. There are two strategies to address this challenge. As demonstrated in this work with misHCR and misHCRn, removing the HCR amplifiers allows for multiplexed detection using a limited number of fluorescence wavelengths. This is achieved through sequential rounds of HCR amplification and imaging. Alternatively, recent computational approaches offer promising solutions for “one-shot” multiplexed imaging. These include combinatorial multiplexing (PMID: 40133518) and spectral unmixing (PMID: 35513404), which can be applied to misHCR to deconvolute overlapping spectra and increase multiplexing capacity in a single imaging acquisition.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Nuclear depletion and cytoplasmic mislocalization/aggregation of the DNA and RNA binding protein TDP-43 are pathological hallmarks of multiple neurodegenerative diseases. Prior work has demonstrated that depletion of TDP-43 from the nucleus leads to alterations in transcription and splicing. Conversely, cytoplasmic mislocalization/aggregation can contribute to toxicity by impairing mRNA transport and translation as well as miRNA dysregulation. However, to date, models of TDP-43 proteinopathy rely on artificial knockdown- or overexpression-based systems to evaluate either nuclear loss or cytoplasmic gain of function events independently. Few model systems authentically reproduce both nuclear depletion and cytoplasmic miscloalization/aggregation events. In this manuscript, the authors generate novel iPSC-based reagents to manipulate the localization of endogenous TDP-43. This is a valuable resource for the field to study pathological consequences of TDP-43 proteinopathy in a more endogenous and authentic setting. However, in the current manuscript, there are a number of weaknesses that should be addressed to further validate the ability of this model to replicate human disease pathology and demonstrate utility for future studies.

      Strengths:

      The primary strength of this paper is the development of a novel in vitro tool.

      Weaknesses:

      There are a number of weaknesses detailed below that should be addressed to thoroughly validate these new reagents as more authentic models of TDP-43 proteinopathy and demonstrate their utility for future investigations.

      (1) The authors should include images of their engineered TDP-43-GFP iPSC line to demonstrate TDP-43 localization without the addition of any nanobodies (perhaps immediately prior to addition of nanobodies). Additionally, it is unclear whether simply adding a GFP tag to endogenous TDP-43 impact its normal function (nuclear-cytoplasmic shuttling, regulation of transcription and splicing, mRNA transport etc).

      We have included images of the untransduced day 20 MNs derived from the engineered TDP43-GFP iPSC lines and the unedited line (Supplementary Fig. 1B).

      We acknowledge the reviewer’s concern about the potential impact of the GFP tag on TDP43's normal function. To address this, we have validated the functionality of TDP43 by assessing the inclusion of cryptic exons in highly sensitive targets such as UNC13A and STMN2, both of which are known to be directly regulated by TDP43.

      We compared MNs derived from the unedited parent line with the TDP43-GFP MNs prior to nanobody addition. As measured by qPCR, cryptic exon inclusion in UNC13A and STMN2 was not observed in the unedited or edited TDP43-GFP MNs (Supplementary Fig.1C), confirming that the tagging does not induce splicing defects by itself. The cryptic exon inclusion in UNC13A and STMN2 were only observed in TDP43-GFP MNs expressing the NES nanobody (Supplementary Fig. 2D). These findings were further supported by our next-generation sequencing data, which also showed that cryptic exon inclusion was specific to the TDP43 mislocalization condition (Supplementary Fig.3 and 4).

      Thus, we have strong evidence that the GFP-tagged TDP43 behaves similarly to the wild-type protein and does not interfere with its function in our model.

      (2) Can the authors explain why there is a significant discrepancy in time points selected for nanobody transduction and immunostaining or cell lysis throughout Figure 1 and 2? This makes interpretation and overall assessment of the model challenging.

      For the phenotypic data shown in Fig.1, we added the AAVs at day 18 or 20 and analyzed the cells at day 40. For the phosphorylated TDP43 western blot (revised Fig. 3D), cells were treated with doxycycline at day 20 to induce nanobody expression and samples were harvested at day 40. Thus, cells were harvested between days 20 or 22 after adding the nanobodies. The onset of transgene expression when using AAVs in neurons typically display slow kinetics. We observed TDP43 mislocalization in less than 50% of the neurons after 7 days post-transduction that peaked at 10-12 days after addition of the nanobodies, when more than 80% of the cells displayed TDP43 mislocalization. Hence, we do not believe that a two-day difference significantly alters the interpretation of the data.

      The decision to harvest neurons at day 30 for the qPCR data was taken to investigate whether the splicing changes seen at day 40 from the transcriptomics analysis can be detected well before the phenotypes observed at day 40.

      (3) The authors should further characterize their TDP-43 puncta. TDP-43 immunostaining is typically punctate so it is unclear if the puncta observed are physiologic or pathologic based on the analyses carried out in the current version of this manuscript. Additionally, do these puncta co-localize with stress granule markers or RNA transport granule markers? Are these puncta phosphorylated (which may be more reminiscent of end-stage pathologic observations in humans)?

      We have tried immunostaining neurons for phosphorylated TDP43. However, our immunostaining attempts were unsuccessful. Depending on the antibody, we either saw no signal (antibody from Cosmo Bio, TIP-PTD-M01A) or even the control neurons displayed detectable phosphorylation within the nucleus (antibody from Proteintech 22309-1-AP). Consequently, we performed western blot analysis using an antibody from Cosmo Bio, (TIP-PTD-M01A) that clearly shows hyperphosphorylation of TDP43 in whole cell lysates (Fig. 3D, E). Hence, we have referred to these structures as puncta and not aggregates (Page 4).

      To assess co-localization of the puncta with stress granules, we immunostained for the stress granule marker G3BP1. This was done in MNs that were treated with sodium arsenite (SA) or PBS as a control. In the PBS treated control MN cultures, TDP43 mislocalization alone did not induce stress granule formation. G3BP1+ stress granules were only observed following SA stress (0.5 mM, 60 minutes). Further, only a subset of TDP43 puncta overlapped with these stress granules (Supplementary Fig. 7) (Page 6).

      (4) The authors should include multiple time points in their evaluation of TDP-43 loss of function events and aggregation. Does loss of function get worse over time? Is there a time course by which RNA misprocessing events emerge or does everything happen all at once? Does aggregation get worse over time? Do these neurons die at any point as a result of TDP-43 proteinopathy?

      We agree that a time course to analyze TDP43 mislocalization and its consequences would be ideal. However, the mislocalization of TDP43 across neurons is not a coordinated process. At each given time instance, neurons display varying levels of TDP43 mislocalization. Answering the questions raised by the reviewer would require tracking individual neurons in real time in a controlled environment over weeks. Unfortunately, we currently do not have the hardware to run these experiments. However, we do observe increased levels of cleaved caspase 3 in MNs expressing the NES nanobody, indicating that these neurons indeed undergo apoptosis by day 40 (Fig.1).

      We have, however, analyzed changes in splicing using qPCR for 12 genes over a time course starting as early as 4 hours after inducing mislocalization. We detect time-dependent cryptic splicing events in all genes as early as 8 hours after doxycycline addition, coinciding with the appearance TDP43 mislocalization (Fig. 4A, B).

      (5) Can the authors please comment on whether or not their model is "tunable"? In real human disease, not every neuron displays complete nuclear depletion of TDP-43. Instead there is often a gradient of neurons with differing magnitudes of nuclear TDP-43 loss. Additionally, very few neurons (5-10%) harbor cytoplasmic TDP-43 aggregates at end-stage disease. These are all important considerations when developing a novel authentic and endogenous model of TDP-43 proteinopathy which the current manuscript fails to address.

      As shown in Fig .1, the neurons expressing the NES-nanobody display a wide range of mislocalization as assessed by the % of nuclear TDP43 present. By titrating the amount of AAVs added to the culture, the model can be tuned to achieve a wide gradient of TDP43 mislocalization.

      We calculated the size and percentage of neurons displaying TDP43 puncta. The size and the number of aggregates varies across the neurons that display TDP43 mislocalization. Around 50% of the neurons displayed small (1  um<sup>2</sup>) puncta while large puncta (> 5  um<sup>2</sup>) were observed in <10% of the cells, similar to observations in patient tissue (Fig. 1F).

      Reviewer #2 (Public Review):

      Summary:

      TDP-43 mislocalization occurs in nearly all of ALS, roughly half of FTD, and as a co-pathology in roughly half of AD cases. Both gain-of-function and loss-of-function mechanisms associated with this mislocalization likely contribute to disease pathogeneisis.

      Here, the authors describe a new method to induce TDP-43 mislocalization in cellular models. They endogenously tagged TDP-43 with a C-terminal GFP tag in human iPSCs. They then expressed an intrabody - fused with a nuclear export signal (NES) - that targeted GFP to the cytosol. Expression of this intrabody-NES in human iPSC-derived neurons induced nuclear depletion of homozygous TDP-43-GFP, caused its mislocalization to the cytosol, and at least in some cells appeared to cause cytosolic aggregates. This mislocalization was accompanied by induction of cryptic exons in well characterized transcripts known to be regulated by TDP-43, a hallmark of functional TDP-43 loss and consistent with pathological nuclear TDP-43 depletion. Interestingly, in heterozygous TDP-43-GFP neurons, expression of intrabody-NES appeared to also induce the mislocalization of untagged TDP-43 in roughly half of the neurons, suggesting that this system can also be used to study effects on untagged endogenous TDP-43 as well as TDP-43-GFP fusion protein.

      Strengths:

      A clearer understanding of how TDP-43 mislocalization alters cellular function, as well as pathways that mitigate clearance of TDP-43 aggregates, is critical. But modeling TDP-43 mislocalization in disease-relevant cellular systems has proven to be challenging. High levels of overexpression of TDP-43 lacking an NES can drive endogenous TDP-43 mislocalization, but such overexpression has direct and artificial consequences on certain cellular features (e.g. altered exon skipping) not seen in diseased patients. Toxic small molecules such as MG132 and arsenite can induce TDP-43 mislocalization, but co-induce myriad additional cellular dysfunctions unrelated to TDP-43 or ALS. TDP-43 binding oligonucleotides can cause cytosolic mislocalization as well. Each system has pros and cons, and additional ways to induce TDP-43 mislocalization would be useful for the field. The method described in this manuscript could provide researchers with a powerful way to study the combined biology of cytosolic TDP-43 mislocalization and nuclear TDP-43 depletion, with additional temporal control that is lacking in current method. Indeed, the authors see some evidence of differences in RNA splicing caused by pure TDP-43 depletion versus their induced mislocalization model. Finally, their method may be especially useful in determining how TDP-43 aggregates are cleared by cells, potentially revealing new biological pathways that could be therapeutically targeted.

      Weaknesses:

      The method and supporting data have limitations in its current form, outlined below, and in its current form the findings are rather preliminary.

      (1) Tagging of TDP-43 with a bulky GFP tag may alter its normal physiological functions, for example phase separation properties and functions within complex ribonucleoprotein complexes. In addition, alternative isoforms of TDP-43 (e.g. "short" TDP-43, would not be GFP tagged and therefore these species would not be directly manipulatable or visualizable with the tools currently employed in the manuscript.

      With reference to our answer above, we have confirmed using qPCR and RNA-seq analysis that adding a GFP tag to the C-terminus of TDP43 does not result in an appreciable loss of functionality. We do not observe any cryptic exon inclusion in STMN2 and UNC13A. Cryptic exon inclusion in these genes, especially STMN2, has been recognized as a very sensitive indicator of TDP43 loss of function (Supplementary Fig 1C, Supplementary 2D, Fig. 3, Fig.4)

      We acknowledge that truncated alternatively spliced versions of TDP43 will lose the GFP-tag and cannot be manipulated with our system. Since our GFP tag is positioned on the C-terminus, our system cannot manipulate these truncated fragments as the tag is lost in these isoforms. But these isoforms, if present, should be detectable using the Proteintech antibody against total TDP43, which recognizes N-terminal TDP43 epitopes. However, western blot analysis, even 20 days after inducing TDP43 mislocalization, showed no truncated fragments. This suggests that TDP43 mislocalization alone is insufficient to generate significant levels of truncated isoforms. We have added this section to the Limitations paragraph (page 9).

      (2) The data regarding potential mislocalization of endogenous TDP-43 in the heterozygous TDP-43-GFP lines is especially intriguing and important, yet very little characterization was done. Does untagged TDP-43 co-aggregate with the tagged TDP-43? Is localization of TDP-43 immunostaining the same as the GFP signal in these cells?

      The purpose of the heterozygous experiments was to see whether mislocalized TDP43 could potentially trap the untagged TDP43. If this was not the case, we would have seen a maximum of 50% of the TDP43 signal mislocalized to the cytoplasm. The fact that a sizeable proportion of cells had significantly higher levels of TDP43 loss from the nucleus, indicates that mislocalized TDP43 can indeed trap the untagged protein fraction. We used GFP immunostaining to identify the tagged TDP43 while an antibody against the endogenous TDP43 protein was used to detect total TDP43 levels. In the cells that show near complete loss of nuclear TDP43, the total TDP43 signal coincides with the GFP (tagged TDP43) signal. We are unable to distinguish the untagged fraction selectively as we do not have an antibody that can detect this directly.  

      But we agree with the reviewer that these observations need further detailed follow-up that we are unable to provide currently. Hence, we have removed this figure from the manuscript.

      (3) The experiments in which dox was used to induce the nanobody-NES, then dox withdrawn to study potential longer-lasting or self-perpetuating inductions of aggregation is potentially interesting. However, the nanobody was only measured at the RNA level. We know that protein half lives can be very long in neurons, and therefore residual nanobody could be present at these delayed time points. The key measurement to make would be at the protein level of the nanobody if any conclusions are be made from this experiment.

      The reviewer has highlighted an important point. To address this issue, we tagged the nanobodies with a V5 tag that allowed us to directly measure nanobody levels within cells. After Dox withdrawal, we indeed observed significant expression of the nanobody within cells even after two weeks of Dox withdrawal. Extending the time point to three weeks allowed complete loss of the nanobody in most neurons. However, in contrast to our observations at two weeks, this was accompanied by a reversal of TDP43 mislocalization in these neurons at three weeks (Fig. 5).

      Surprisingly, in less than 10% of the neurons, we observed >80% of the total TDP43 still mislocalized to the cytoplasm, despite nearly undetectable levels of the nanobody. Super-resolution microscopy further revealed persistent cytoplasmic TDP43 in these neurons that did not overlap with residual nanobody signal. This suggests that in these neurons, the nanobody was no longer required to maintain TDP43 mislocalization (Fig. 5, page 7)

      (4) Potential differences in splicing and microRNAs between TDP-43 knockdown and TDP-43 mislocalization are potentially interesting. However, different patterns of dysregulated RNA splicing can occur at different levels of TDP-knockdown, thus it is difficult to assess whether the changes observed in this paper are due to mislocalization per se, or rather just reflect differences in nuclear TDP-43 abundance.

      This a fair point. It is possible that microRNA dysregulation might require a greater loss of nuclear TDP43 and maybe more resilient to TDP43 loss as compared to splicing. We have acknowledged this in the discussion section (page 9).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) It would be helpful to include nuclear vs cytoplasmic ratios of TDP-43 instead of simply "% nuclear TDP-43"

      We have used % nuclear TDP43 as these values have biologically meaningful upper and lower bounds, which makes it easier to compare across experiments. We found that using a ratio of nuclear vs cytoplasmic TDP43 intensities displayed higher variability and a wider range.

      We have re-labelled the y-axis as “% Nuclear TD43 / soma TDP43” to make our quantification clearer. The conversion from % nuclear TDP43 to N/C is straightforward. If the % nuclear TDP43 is X, then the N/C ratio can be calculated as X / (100-X). For example, a % nuclear TDP43 of 80% would amount to an N/C ratio of 80/20 = 4.

      (2) The axis descriptions in Figure 1D are very unclear. While this is described better in the figure legend, it would be beneficial to have a more descriptive y-axis title in the figure (which may mean increasing the number of graphs).

      Axis descriptions and figures changed as recommended.

      (3) In Figure 1, the time points at which iPSNs were transduced with nanobody and/or fixed for immunostaining is somewhat inconsistent across all panels. This hinders interpretation of the figure as a whole. The authors should use same transduction and immunostaining time points for consistency or demonstrate that the same phenotype is observed regardless of transduction and immunostaining day as long as the time in between (time of nano body expression) is consistent. Subsequently, in Figure 2, a different set of time points is used.

      Please see our response in the public comments above

      (4) In Figure 1, please show individual data points for each independent differentiation to demonstrate the level of reproducibility from batch to batch.

      Data points have been shown per replicate (Supplementary Fig. 2)

      We have refined our approach for phenotypic analysis to improve consistency across different clones. Previously, we set thresholds on % nuclear TDP43 to distinguish MNs with nuclear versus mislocalized TDP43. This was done by ranking all cells based on % nuclear TDP43 and applying quantile-based thresholds—designating the top 25% as control and the bottom 25% as mislocalized, ensuring equal number of cells per category. However, we observed significant variability in thresholds across clones. For instance, the E8 clone had thresholds of 96% and 29%, while the E5 clone had 93% and 40%.

      To address this, we reanalysed the data using a standardized three-bin approach:

      (1) Control: MNs expressing the control nanobody.

      (2) Low-Moderate Mislocalization: MNs expressing the NES nanobody with > 40% nuclear TDP43.

      (3) Severe Mislocalization: MNs expressing the NES nanobody with < 40% nuclear TDP43.

      This approach ensured a more reliable comparison of TDP43 mislocalization effects across experiments. The conclusions remain the same.

      (5) In Figure 2, please show individual data points.

      Data points for all the qPCR analyses in the paper have been included as a supplementary text file.

      (6) In Figure 3, please show individual data points.

      Data points for the western blot data have been included as a supplementary data file.

      All other comments are within the public review.

      Reviewer #2 (Recommendations For The Authors):

      (1) In general more robust quantification of many of the described phenotypes are necessary. In particular, no apparent quantification of cytosolic mislocalization was performed in Figure 1, or quantification of mislocalization of Figure 3F. It is unclear in the western blot in Fig 1G if TDP-43 signal were normalized to total protein, and of note it seems that expression of the intrabody-NES reduced total proteins in the western blots that were shown. No quantification or measurement of the insoluble material was done or shown.

      We have quantified cytosolic mislocalization of TDP43 (Fig. 1C). The y-axis indicates the total TDP43 signal observed in the nucleus as a percentage of the total signal observed in the soma (including the nucleus). This value has the advantage of ranging between 100% (perfectly nuclear) to 0% (complete nuclear loss). The boxplots indicate that expression of the NES-nanobody results in a range of cytosolic mislocalization with a median value around 40% of the TDP43 remaining in the nucleus.

      Western blot data in previous Fig. 1G was normalized to alpha-tubulin. We were unable to get a good signal for the insoluble fraction. From the alpha-tubulin alone, it cannot be concluded that NES-nanobody results in a decrease in total protein levels. In the revised western blot for phosphorylated TDP43 (Fig. 3D, E), we have quantified total and phosphorylated TDP43. Here, we observe a six-fold increase in the levels of phosphorylated TDP43 without a significant change in total TDP43 protein levels.

      To avoid potential mis-interpretation of our results, we have now removed the previous Fig. 1G.

      (2) Additional images of nearly all microscopy data at higher magnifications would be required to better evaluate TDP-43 localization. Ideally including images for each channel in addition to merged images, and especially for key figures such as Figure 1B, 3B, 3F.

      Better images have been provided.

      (3) No control images were shown for Figure 1F and 3F. It is unclear what the bright punctate spots of cytoplasmic TDP-43 GFP signal represent. Are these true aggregates? If so, additional characterization would be required before such conclusions can be made, beyond the relatively superficial western blot analysis that was done in Figure 1.

      Control images have now been provided (Figure 1E). As we mentioned above, immunostaining analysis to characterize whether the aggregates are phosphorylated failed to provide a clear signal. However, we have now confirmed that the mislocalized TDP43 is indeed hyper-phosphorylated (Figure 3D, E). We have acknowledged this in the main text, and have referred to these as puncta reminiscent of aggregates (Page 4, Page 6).

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:  

      Reviewer #1 (Public Review):

      Summary:

      This paper reports an intracranial SEEG study of speech coordination, where participants synchronize their speech output with a virtual partner that is designed to vary its synchronization behavior. This allows the authors to identify electrodes throughout the left hemisphere of the brain that have activity (both power and phase) that correlates with the degree of synchronization behavior. They find that high-frequency activity in the secondary auditory cortex (superior temporal gyrus) is correlated to synchronization, in contrast to primary auditory regions. Furthermore, activity in the inferior frontal gyrus shows a significant phase-amplitude coupling relationship that is interpreted as compensation for deviation from synchronized behavior with the virtual partner.

      Strengths:

      (1) The development of a virtual partner model trained for each individual participant, which can dynamically vary its synchronization to the participant's behavior in real-time, is novel and exciting.

      (2) Understanding real-time temporal coordination for behaviors like speech is a critical and understudied area.

      (3) The use of SEEG provides the spatial and temporal resolution necessary to address the complex dynamics associated with the behavior.

      (4) The paper provides some results that suggest a role for regions like IFG and STG in the dynamic temporal coordination of behavior both within an individual speaker and across speakers performing a coordination task.

      We thank the Reviewer for their positive comments on our manuscript.

      Weaknesses:

      (1) The main weakness of the paper is that the results are presented in a largely descriptive and vague manner. For instance, while the interpretation of predictive coding and error correction is interesting, it is not clear how the experimental design or analyses specifically support such a model, or how they differentiate that model from the alternatives. It's possible that some greater specificity could be achieved by a more detailed examination of this rich dataset, for example by characterizing the specific phase relationships (e.g., positive vs negative lags) in areas that show correlations with synchronization behavior. However, as written, it is difficult to understand what these results tell us about how coordination behavior arises.

      We understand the reviewer’s comment. It is true that this work, being the first in the field using real-time adapting synchronous speech and intracerebral neural data, is a descriptive work, that hopefully will pave the way for further studies. We have now added more statistical analyses (see point 2) to go beyond a descriptive approach and we have also rewritten the discussion to clarify how this work can possibly contribute to disentangle different models of language interaction. Most importantly we have also run new analyses taking into account the specific phase relationship, as suggested.

      We already had an analysis using instantaneous phase difference in the phase-amplitude coupling approach, that bridges phase of behaviour to neural responses (amplitude in the high-frequency range). However, this analysis, as the reviewer noted, does not distinguish between positive and negative lags, but rather uses the continuous fluctuations of coordinative behaviour. Following the reviewer’s suggestion, we have now run a new analysis estimating the average delay (between virtual partner speech and patient speech) in each trial, using a cross-correlation approach. This gives a distribution of delays across trials that can then be “binned” as positive or negative. We have thus rerun the phase-amplitude coupling analyses on positive and negative trials separately, to assess whether the phase amplitude relationship depends upon the anticipatory (negative lags) or compensatory (positive lags) behaviour. Our new analysis (now in the supplementary, see figure below) does not reveal significant differences between positive and negative lags. This lack of difference, although not easy to interpret, is nonetheless interesting because it seems to show that the IFG does not have a stronger coupling for anticipatory trials. Rather the IFG seems to be strongly involved in adjusting behaviour, minimizing the error, independently of whether this is early or late.

      We have updated the “Coupling behavioural and neurophysiological data” section in Materials and methods as follows:  

      “In the third approach, we assessed whether the phase-amplitude relationship (or coupling) depends upon the anticipatory (negative delays) or compensatory (positive delays) behaviour between the VO and the patients’ speech. We computed the average delay in each trial using a cross-correlation approach on speech signals (between patient and VP) with the MATLAB function xcorr. A median split (patient-specific ; average median split = 0ms, average sd = 24ms) was applied to conserve a sufficient amount of data, classifying trials below the median as “anticipatory behaviour” and trials above the median as “compensatory behaviour”. Then we conducted the phase-amplitude coupling analyses on positive and negative trials separately.”

      We also added a paragraph on this finding in the Discussion:

      “Our results highlight the involvement of the inferior frontal gyrus (IFG) bilaterally, in particular the BA44 region, in speech coordination. First, trials with a weak verbal coordination (VCI) are accompanied by more prominent high frequency activity (HFa, Fig.4; Fig.S4). Second, when considering the within-trial time-resolved dynamics, the phase-amplitude coupling (PAC) reveals a tight relation between the low frequency behavioural dynamics (phase) and the modulation of high-frequency neural activity (amplitude, Fig.5B ; Fig.S5). This relation is strongest when considering the phase adjustments rather than the phase of speech of the VP per se : larger deviations in verbal coordination are accompanied by increase in HFa. Additionally, we also tested for potential effects of different asynchronies (i.e., temporal delay) between the participant's speech and that of the virtual partner but found no significant differences (Fig.S6). While lack of delay-effect does not permit to conclude about the sensitivity of BA44 to absolute timing of the partner’s speech, its neural dynamics are linked to the ongoing process of resolving phase deviations and maintaining synchrony.”

      (2) In the results section, there's a general lack of quantification. While some of the statistics reported in the figures are helpful, there are also claims that are stated without any statistical test. For example, in the paragraph starting on line 342, it is claimed that there is an inverse relationship between rho-value and frequency band, "possibly due to the reversed desynchronization/synchronization process in low and high frequency bands". Based on Figure 3, the first part of this statement appears to be true qualitatively, but is not quantified, and is therefore impossible to assess in relation to the second part of the claim. Similarly, the next paragraph on line 348 describes optimal clustering, but statistics of the clustering algorithm and silhouette metric are not provided. More importantly, it's not entirely clear what is being clustered - is the point to identify activity patterns that are similar within/across brain regions? Or to interpret the meaning of the specific patterns? If the latter, this is not explained or explored in the paper.

      The reviewer is right. We have now added statistical analyses showing that:

      (1) the ratio between synchronization and desynchronization evolves across frequencies (as often reported in the literature).

      (2) the sign of rho values also evolves across frequencies.

      (3) the clustering does indeed differ when taking into account behaviour. We have also clarified the use of clustering and the reasoning behind it.

      We have updated the Materials and methods section as follows:

      “The statistical difference between spatial clustering in global effect and brain-behaviour correlation was estimated with linear model using the R function lm (stat package), post-hoc comparisons were corrected for multiple comparisons using the Tukey test (lsmeans R package ; Lenth, 2016). The statistical difference between clustering in global effect and behaviour correlation across the number of clusters was estimated using permutation tests (N=1000) by computing the silhouette score difference between the two conditions.” We have updated the Results section as follows:

      (1) “This modulation between synchronization and desynchronization across frequencies was significant (F(5) = 6.42, p < .001 ; estimated with linear model using the R function lm).”

      (2) “The first observation is a gradual transition in the direction of correlations as we move up frequency bands, from positive correlations at low frequencies to negative ones at high frequencies (F(5) = 2.68, p = .02). This effect, present in both hemispheres, mimics the reversed desynchronization/synchronization process in low and high frequency bands reported above.”

      (3) “Importantly, compared to the global activity (task vs rest, Fig 3A), the neural spatial profile of the behaviour-related activity (Fig 3B) is more clustered, in the left hemisphere. Indeed, silhouette scores are systematically higher for behaviour-related activity compared to global activity, indicating greater clustering consistency across frequency bands (t(106) = 7.79, p < .001, see Figure S3). Moreover, silhouette scores are maximal, in particular for HFa, for five clusters (p < .001), located in the IFG BA44, the IPL BA 40 and the STG BA 41/42 and BA22 (see Figure S3).”

      (3) Given the design of the stimuli, it would be useful to know more about how coordination relates to specific speech units. The authors focus on the syllabic level, which is understandable. But as far as the results relate to speech planning (an explicit point in the paper), the claims could be strengthened by determining whether the coordination signal (whether error correction or otherwise) is specifically timed to e.g., the consonant vs the vowel. If the mechanism is a phase reset, does it tend to occur on one part of the syllable?

      Thank you for this thoughtful feedback. We agree that the relationship between speech coordination and specific speech units, such as consonants versus vowels, is an intriguing question. However, in our study, both interlocutors (the participant and the virtual partner) are adapting their speech production in real-time. This interactive coordination makes it difficult to isolate neural signatures corresponding to precise segments like consonants or vowels, as the adjustments occur in a continuous and dynamic context.

      The VP's ability to adapt depends on its sensitivity to spectral cues, such as the transition from one phonetic element to another. This is likely influenced by the type of articulation, with certain transitions being more salient (e.g., between a stop consonant like "p" and a vowel like "a") and others being less distinct (e.g., between nasal consonants like "m" and a vowel). Thus, the VP’s spectral adaptation tends to occur at these transitions, which are more prominent in some cases than in others.

      For the participants, previous studies have shown a greater sensitivity during the production of stressed vowels (Oschkinat & Hoole, 2022; Li & Lancia, 2024), which may reflect a heightened attentional or motor adjustment to stressed syllables.

      Here, we did not specifically address the question of coordination at the level of individual linguistic units. Moreover, even if we attempted to focus on this level, it would be challenging to relate neural dynamics directly to specific speech segments. The question of how synchronization at the level of individual linguistic units might relate to neural data is complex. The lack of clear, unit-specific predictions makes it difficult to parse out distinct neural signatures tied to individual segments, particularly when both interlocutors are continuously adjusting their speech in relation to one another.

      Therefore, while we recognize the potential importance of examining synchronization at the level of individual phonetic elements, the design of our task and the nature of the coordination in this interactive context (realtime bidirection adaptation) led us to focus more broadly on the overall dynamics of speech synchronization at the syllabic level, rather than on specific linguistic units.

      We now state at the end of the Discussion section:

      “It is worth noting that the influence of specific speech units, such as consonants versus vowels, on speech coordination remains to be explored. In non-interactive contexts, participants show greater sensitivity during the production of stressed vowels, possibly reflecting heightened attentional or motor adjustments (Oschkinat & Hoole, 2022; Li & Lancia, 2024). In this study, the VP’s adaptation relies on sensitivity to spectral cues, particularly phonetic transitions, with some (e.g., formant transitions) being more salient than others. However, how these effects manifest in an interactive setting remains an open question, as both interlocutors continuously adjust their speech in real time. Future studies could investigate whether coordination signals, such as phase resets, preferentially align with specific parts of the syllable.” References cited:

      – Oschkinat, M., & Hoole, P. (2022). Reactive feedback control and adaptation to perturbed speech timing in stressed and unstressed syllables. Journal of Phonetics, 91, 101133.

      – Li, J., & Lancia, L. (2024). A multimodal approach to study the nature of coordinative patterns underlying speech rhythm. In Proc. Interspeech, 397-401.

      (4) In the discussion the results are related to a previously-described speech-induced suppression effect. However, it's not clear what the current results have to do with SIS, since the speaker's own voice is present and predictable from the forward model on every trial. Statements such as "Moreover, when the two speech signals come close enough in time, the patient possibly perceives them as its own voice" are highly speculative and apparently not supported by the data.

      We thank the reviewer for raising thoughtful concerns about our interpretation of the observed neural suppression as related to speaker-induced suppression (SIS). We agree that our study lacks a passive listening condition, which limits direct comparisons to the original SIS effect, traditionally defined as the suppression of neural responses to self-produced speech compared to externally-generated speech (Meekings & Scott, 2021).

      In response, we have reconsidered our terminology and interpretation. In the revised Discussion section, we refer to our findings as a "SIS-related phenomenon specific to the synchronous speech context". Unlike classic SIS paradigms, our interactive task involves simultaneous monitoring of self- and externally-generated speech, introducing additional attentional and coordinative demands.

      The revised Discussion also incorporates findings by Ozker et al. (2022, 2024), which link SIS and speech monitoring, suggesting that suppressing responses to self-generated speech facilitates error detection. We propose that the decrease in high-frequency activity (HFa) as verbal coordination increases reflects reduced error signals due to closer alignment between perceived and produced speech. Conversely, HFa increases with reduced coordination may signify greater prediction error.

      Additionally, we relate our findings to the "rubber voice" effect (Zheng et al., 2011; Lind et al., 2014; Franken et al., 2021), where temporally and phonetically congruent external speech can be perceived as self-generated. We speculate that this may occur in synchronous speech tasks when the participant's and VP's speech signals closely align. However, this interpretation remains speculative, as no subjective reports were collected to confirm this perception. Future studies could include participant questionnaires to validate this effect and relate subjective experience to neural measures of synchronization.

      Overall, our findings extend the study of SIS to dynamic, interactive contexts and contribute to understanding internal forward models of speech production in more naturalistic scenarios.

      We have now added these points to the discussion as follows:

      “The observed negative correlation between verbal coordination and high-frequency activity (HFa) in STG BA22 suggests a suppression of neural responses as the degree of behavioural synchrony increases. This result is reminiscent of findings on speaker-induced suppression (SIS), where neural activity in auditory cortex decreases during self-generated speech compared to externally-generated speech (Meekings & Scott, 2021; Niziolek et al., 2013). However, our paradigm differs from traditional SIS studies in two critical ways: (1) the speaker's own voice is always present and predictable from the forward model, and (2) no passive listening condition was included. Therefore, our findings cannot be directly equated with the original SIS effect.

      Instead, we propose that the suppression observed here reflects a SIS-related phenomenon specific to the synchronous speech context. Synchronous speech requires simultaneous monitoring of self- and externallygenerated speech, a task that is both attentionally demanding and coordinative. This aligns with evidence from Ozker et al. (2024, 2022), showing that the same neural populations in STG exhibit SIS and heightened responses to feedback perturbations. These findings suggest that SIS and speech monitoring are related processes, where suppressing responses to self-generated speech facilitates error detection. In our study, suppression of HFa as coordination increases may reflect reduced prediction errors due to closer alignment between perceived and produced speech signals. Conversely, increased HFa during poor coordination may signify greater mismatch, consistent with prediction error theories (Houde & Nagarajan, 2011; Friston et al., 2020). Furthermore, when self- and externally-generated speech signals are temporally and phonetically congruent, participants may perceive external speech as their own. This echoes the "rubber voice" effect, where external speech resembling self-produced feedback is perceived as self-generated (Zheng et al., 2011; Lind et al., 2014; Franken et al., 2021). While this interpretation remains speculative, future studies could incorporate subjective reports to investigate this phenomenon in more detail.” References cited:

      – Franken, M. K., Hartsuiker, R. J., Johansson, P., Hall, L., & Lind, A. (2021). Speaking With an Alien Voice: Flexible Sense of Agency During Vocal Production. Journal of Experimental Psychology-Human perception and performance, 47(4), 479-494. https://doi.org/10.1037/xhp0000799

      – Houde, J. F., & Nagarajan, S. S. (2011). Speech production as state feedback control. Frontiers in human neuroscience, 5, 82.

      – Lind, A., Hall, L., Breidegard, B., Balkenius, C., & Johansson, P. (2014). Speakers' acceptance of real-time speech exchange indicates that we use auditory feedback to specify the meaning of what we say. Psychological Science, 25(6), 1198-1205. https://doi.org/10.1177/0956797614529797

      – Meekings, S., & Scott, S. K. (2021). Error in the Superior Temporal Gyrus? A Systematic Review and Activation Likelihood Estimation Meta-Analysis of Speech Production Studies. Journal of Cognitive Neuroscience, 33(3), 422-444. https://doi.org/10.1162/jocn_a_01661

      – Niziolek C. A., Nagarajan S. S., Houde J. F (2013) What does motor efference copy represent? Evidence from speech production Journal of Neuroscience 33:16110–16116Ozker M., Doyle W., Devinsky O., Flinker A (2022) A cortical network processes auditory error signals during human speech production to maintain fluency PLoS Biology 20.

      – Ozker, M., Yu, L., Dugan, P., Doyle, W., Friedman, D., Devinsky, O., & Flinker, A. (2024). Speech-induced suppression and vocal feedback sensitivity in human cortex. eLife, 13, RP94198. https://doi.org/10.7554/eLife.94198

      – Zheng, Z. Z., MacDonald, E. N., Munhall, K. G., & Johnsrude, I. S. (2011). Perceiving a Stranger's Voice as Being One's Own: A 'Rubber Voice' Illusion? PLOS ONE, 6(4), e18655.

      (5) There are some seemingly arbitrary decisions made in the design and analysis that, while likely justified, need to be explained. For example, how were the cutoffs for moderate coupling vs phase-shifted coupling (k ~0.09) determined? This is noted as "rather weak" (line 212), but it's not clear where this comes from. Similarly, the ROI-based analyses are only done on regions "recorded in at least 7 patients" - how was this number chosen? How many electrodes total does this correspond to? Is there heterogeneity within each ROI?

      The reviewer is correct, we apologize for this missing information. We now specify that the coupling values were empirically determined on the basis of a pilot experiment in order to induce more or less synchronization, but keeping the phase-shifted coupling at a rather implicit level.  

      Concerning the definition of coupling as weak, one should consider that, in the Kuramoto model, the strength of coupling (k) is relative to the spread of the natural frequencies (Δω) in the system. In our study, the natural frequencies of syllables range approximately from 2 Hz to 10Hz, resulting in a frequency spread of Δω = 8 Hz. For coupling to strongly synchronize oscillators across such a wide range, k must be comparable to or exceed Δω. Thus, since k = 0.1 is far much smaller than Δω, it is therefore classified as weak coupling.

      We have now modified the Materials and methods section as follows:

      “More precisely, for a third of the trials the VP had a neutral behaviour (close to zero coupling: k = +/- 0.01). For a third it had a moderate coupling, meaning that the VP synchronised more to the participant speech (k = -0.09). And for the last third of the trials the VP had a moderate coupling but with a phase shift of pi/2, meaning that it moderately aimed to speak in between the participant syllables (k = + 0.09). The coupling values were empirically determined on the basis of a pilot experiment in order to induce more or less synchronization but keeping the phase-shifted coupling at a rather implicit level. In other terms, while participants knew that the VP would adapt, they did not necessarily know in which direction the coupling went.”

      Regarding the criterion of including regions recorded in at least 7 patients, our goal was to balance data completeness with statistical power. Given our total sample of 16 patients, this threshold ensures that each included region is represented in at least ~44% of the cohort, reducing the likelihood of spurious findings due to extremely small sample sizes. This choice also aligns with common neurophysiological analysis practices, where a minimum number of subjects (at least 2 in extreme cases) is required to achieve meaningful interindividual comparisons while avoiding excessive data exclusion. Additionally, this threshold maintains a reasonable tradeoff between maximizing patient inclusion and ensuring that statistical tests remain robust.

      We have now added more information in the Results section “Spectral profiles in the language network are nuanced by behaviour” on this point as follows:

      “To balance data completeness and statistical power, we included only brain regions recorded in at least 7 patients (~44% of the cohort) for the left hemisphere and at least 5 patients for the right hemisphere (~31% of the cohort), ensuring sufficient representation while minimizing biases due to sparse data.”

      Reviewer #2 (Public Review):

      Summary:

      This paper investigates the neural underpinnings of an interactive speech task requiring verbal coordination with another speaker. To achieve this, the authors recorded intracranial brain activity from the left hemisphere in a group of drug-resistant epilepsy patients while they synchronised their speech with a 'virtual partner'. Crucially, the authors were able to manipulate the degree of success of this synchronisation by programming the virtual partner to either actively synchronise or desynchronise their speech with the participant, or else to not vary its speech in response to the participant (making the synchronisation task purely one-way). Using such a paradigm, the authors identified different brain regions that were either more sensitive to the speech of the virtual partner (primary auditory cortex), or more sensitive to the degree of verbal coordination (i.e. synchronisation success) with the virtual partner (secondary auditory cortex and IFG). Such sensitivity was measured by (1) calculating the correlation between the index of verbal coordination and mean power within a range of frequency bands across trials, and (2) calculating the phase-amplitude coupling between the behavioural and brain signals within single trials (using the power of high-frequency neural activity only). Overall, the findings help to elucidate some of the left hemisphere brain areas involved in interactive speaking behaviours, particularly highlighting the highfrequency activity of the IFG as a potential candidate supporting verbal coordination.

      Strengths:

      This study provides the field with a convincing demonstration of how to investigate speaking behaviours in more complex situations that share many features with real-world speaking contexts e.g. simultaneous engagement of speech perception and production processes, the presence of an interlocutor, and the need for inter-speaker coordination. The findings thus go beyond previous work that has typically studied solo speech production in isolation, and represent a significant advance in our understanding of speech as a social and communicative behaviour. It is further an impressive feat to develop a paradigm in which the degree of cooperativity of the synchronisation partner can be so tightly controlled; in this way, this study combines the benefits of using prerecorded stimuli (namely, the high degree of experimental control) with the benefits of using a live synchronisation partner (allowing the task to be truly two-way interactive, an important criticism of other work using pre-recorded stimuli). A further key strength of the study lies in its employment of stereotactic EEG to measure brain responses with both high temporal and spatial resolution, an ideal method for studying the unfolding relationship between neural processing and this dynamic coordination behaviour.

      We sincerely appreciate the Reviewer's thoughtful and positive feedback on our manuscript.

      Weaknesses:

      One major limitation of the current study is the lack of coverage of the right hemisphere by the implanted electrodes. Of course, electrode location is solely clinically motivated, and so the authors did not have control over this. However, this means that the current study neglects the potentially important role of the right hemisphere in this task. The right hemisphere has previously been proposed to support feedback control for speech (likely a core process engaged by synchronous speech), as opposed to the left hemisphere which has been argued to underlie feedforward control (Tourville & Guenther, 2011). Indeed, a previous fMRI study of synchronous speech reported the engagement of a network of right hemisphere regions, including STG, IPL, IFG, and the temporal pole (Jasmin et al., 2016). Further, the release from speech-induced suppression during a synchronous speech reported by Jasmin et al. was found in the right temporal pole, which may explain the discrepancy with the current finding of reduced leftward high-frequency activity with increasing verbal coordination (suggesting instead increased speech-induced suppression for successful synchronisation). The findings should therefore be interpreted with the caveat that they are limited to the left hemisphere, and are thus likely missing an important aspect of the neural processing underpinning verbal coordination behaviour.

      We have now included, in the supplementary materials, data from the right hemisphere, although the coverage is a bit sparse (Figures S2, S4, S5, see our responses in the ‘Recommendation for the authors’ section, below). We have also revised the Discussion section to add the putative role of right temporal regions (see below as well).

      A further limitation of this study is that its findings are purely correlational in nature; that is, the results tell us how neural activity correlates with behaviour, but not whether it is instrumental in that behaviour. Elucidating the latter would require some form of intervention such as electrode stimulation, to disrupt activity in a brain area and measure the resulting effect on behaviour. Any claims therefore as to the specific role of brain areas in verbal coordination (e.g. the role of the IFG in supporting online coordinative adjustments to achieve synchronisation) are therefore speculative.

      We appreciate the reviewer’s observation regarding the correlational nature of our findings and agree that this is a common limitation of neuroimaging studies. While elucidating causal relationships would indeed require intervention techniques such as electrical stimulation, our study leverages the unique advantages of intracerebral recordings, offering the best available spatial and temporal resolution alongside a high signal-tonoise ratio. These attributes ensure that our data accurately reflect neural activity and its temporal dynamics, providing a robust foundation for understanding the relationship between neural processes and behaviour. Therefore, while causal claims are beyond the scope of this study, the precision of our methodology allows us to make well-supported observations about the neural correlates of synchronous speech tasks.

      Recommendations for the authors:

      Reviewing Editor Comment:

      After joint consultation, we are seeing the potential for the report to be strengthened and the evidence here to be deemed ultimately at least 'solid': to us (editors and reviewers) it seems that this would require both (1) clarifying/acknowledging the limitations of not having right hemisphere data, and (2) running some of the additional analyses the reviewers suggest, which should allow for richer examination of the data e.g. phase relationships in areas that correlate with synchronisation.

      We have now added data on the right hemisphere (RH) that we did not previously report due to a rather sparse sampling of the RH. These results are now reported in the Results section as well as in the Supplementary section, where we put all right hemisphere figures for all analyses (Figure S2, S4, S5). We have also run additional analyses digging into the phase relationship in areas that correlate with synchronisation (Figure S6). These additional analyses allowed us to improve the Discussion section as well.

      Reviewer #1 (Recommendations For The Authors):

      In some sections, the writing is a bit unclear, with both typos and vague statements that could be fixed with careful proofreading.

      We thank the reviewer for pointing out areas where the writing could be improved. We carefully proofread the manuscript to address typos and clarify any vague statements. Specific sections identified as unclear have been rephrased for better precision and readability.

      In Figure 1, the colors repeat, making it impossible to tell patients apart.

      We have now updated Figure 1 colormap to avoid redundancy and added the right hemisphere.

      Line 132: "16 unilateral implantations (9 left, 7 bilateral implantations)". Should this say 7 right hemisphere? If so, the following sentence stating that there was "insufficient cover [sic] of the right hemisphere" is unclear, since the number of patients between LH and RH is similar.

      The confusion was due to the fact that the lateralization refers to the presence/absence of electrodes in the Heschl’s gyrus (left : H’ ; right : H) exclusively.

      We have thus changed this section as follows:

      “16 patients (7 women, mean age 29.8 y, range 17 - 50 y) with pharmacoresistant epilepsy took part in the study. They were included if their implantation map covered at least partially the Heschl's gyrus and had sufficiently intact diction to support relatively sustained language production.” The relevant part (previously line 132) now states:

      “Sixteen patients with a total of 236 electrodes (145 in the left hemisphere) and 2395 contacts (1459 in the left hemisphere, see Figure 1). While this gives a rather sparse coverage of the right hemisphere, we decided, due to the rarity of this type of data, to report results for both hemispheres, with figures for the left hemisphere in the main text and figures for the right hemisphere in the supplementary section.”

      Reviewer #2 (Recommendations For The Authors):

      (1) To address the concern regarding the absence of data from the right hemisphere, I would advise the authors to directly acknowledge this limitation in their Discussion section, citing relevant work suggesting that the right hemisphere has an important role to play in this task (e.g. Jasmin et al., 2016). You should also make this clear in your abstract e.g. you could rewrite the sentence in line 40 to be: "Then, we recorded the intracranial brain activity of the left hemisphere in 16 patients with drug-resistant epilepsy...".

      We are grateful to the reviewer for this comment that incited us to look into the right hemisphere data. We have now included results in the right hemisphere, although the coverage is a bit sparse. We have also revised the Discussion section to add the putative role of right temporal regions. Interestingly, our results show, as suggested by the reviewer, a clear involvement of the RH in this task.

      First, the full brain analyses show a very similar implication of the RH as compared to the LH (see Figure below). We have now added in the Results section:

      “As expected, the whole language network is strongly involved, including both dorsal and ventral pathways (Fig 3A). More precisely, in the left temporal lobe the superior, middle and inferior temporal gyri, in the left parietal lobe the inferior parietal lobule (IPL) and in the left frontal lobe the inferior frontal gyrus (IFG) and the middle frontal gyrus (MFG). Similar results are observed in the right hemisphere, neural responses being present across all six frequency bands with medium to large modulation in activity compared to baseline (Figure S2A) in the same regions. Desynchronizations are present in the theta, alpha and beta bands while the low gamma and HFa bands show power increases.”

      As to compared to the left hemisphere, assessing brain-behaviour correlations in the right hemisphere does not provide the same statistical power, because some anatomical regions have very few electrodes. Nonetheless, we observe a strong correlation in the right IFG, similar to the one we previously reported in the left hemisphere, and we now report in the Results section:

      “The decrease in HFa along the dorsal pathway is replicated in the right hemisphere (Figure S4). However, while both the right STG BA41/42 and STG BA22 present a power increase (compared to baseline) — with a stronger increase for the STG BA41/42 — neither shows a significant correlation with verbal coordination (t(45)=-1.65, p=.1 ; t(8)=-0.67, p=.5 ; Student’s T test, FDR correction). By contrast, results in the right IFG BA44 are similar to the one observed in the left hemisphere with a significant power increase associated with a negative brainbehaviour correlation (t(17) = -3.11, p = .01 ; Student’s T test, FDR correction).”

      Interestingly, the phase-amplitude coupling analysis yields very similar results in both hemispheres (exception made for BA22). We have thus updated the Results section as follows:

      “Notably, when comparing – within the regions of interest previously described – the PAC with the virtual partner speech and the PAC with the phase difference, the coupling relationship changes when moving along the dorsal pathway: a stronger coupling in the auditory regions with the speech input, no difference between speech and coordination dynamics in the IPL and a stronger coupling for the coordinative dynamics compared to speech signal in the IFG (Figure 5B ). When looking at the right hemisphere, we observe the same changes in the coupling relationship when moving along the dorsal pathway, except that no difference between speech and coordination dynamics is present in the right secondary auditory regions (STG BA22; Figure S5).”

      We also included in the Discussion section the right hemisphere results also mentioning previous work of Guenther and the one of Jasmin. On the section “Left secondary auditory regions are more sensitive to coordinative behaviour” one can read:

      “Furthermore, the absence of correlation in the right STG BA22 (Figure S4) seems in first stance to challenge influential speech production models (e.g. Guenther & Hickok, 2016) that propose that the right hemisphere is involved in feedback control. However, one needs to consider the the task at stake heavily relied upon temporal mismatches and adjustments. In this context, the left-lateralized sensitivity to verbal coordination reminds of the works of Floegel and colleagues (2020, 2023) suggesting that both hemispheres are involved depending on the type of error: the right auditory association cortex monitoring preferentially spectral speech features and the left auditory association cortex monitoring preferentially temporal speech features. Nonetheless, the right temporal pole seems to be sensitive to speech coordinative behaviour, confirming previous findings using fMRI (Jasmin et al., 2016) and thus showing that the right hemisphere has an important role to play in this type of tasks (e.g. Jasmin et al., 2016).”

      References cited:

      – Floegel, M., Fuchs, S., & Kell, C. A. (2020). Differential contributions of the two cerebral hemispheres to temporal and spectral speech feedback control. Nature Communications, 11(1), 2839.

      – Floegel, M., Kasper, J., Perrier, P., & Kell, C. A. (2023). How the conception of control influences our understanding of actions. Nature Reviews Neuroscience, 24(5), 313-329.

      – Guenther, F. H., & Hickok, G. (2016). Neural models of motor speech control. In Neurobiology of language (pp. 725-740). Academic Press.

      (2) When discussing previous work on alignment during synchronous speech, you may wish to include a recently published paper by Bradshaw et al (2024); this manipulated the acoustics of the accompanist's voice during a synchronous speech task to show interactions between speech motor adaptation and phonetic convergence/alignment.

      We thank the reviewer for pointing to this recent and interesting paper. We added the article as reference as follows

      “Furthermore, synchronous speech favors the emergence of alignment phenomena, for instance of the fundamental frequency or the syllable onset (Assaneo et al., 2019 ; Bradshaw & McGettigan, 2021 ; Bradshaw et al., 2023; Bradshaw et al., 2024).”

      (3) Line 80: "Synchronous speech resembles to a certain extent to delayed auditory feedback tasks"- I think you mean "altered auditory feedback tasks" here.

      In the case of synchronous speech it is more about timing than altered speech signals, that is why the comparison is done with delayed and not altered auditory feedback. Nonetheless, we understand the Reviewer’s point and we have now changed the sentence as follows:

      “Synchronous speech resembles to a certain extent to delayed/altered auditory feedback tasks”

      (4) When discussing superior temporal responses during such altered feedback tasks, you may also want to cite a review paper by Meekings and Scott (2021).

      We thank the reviewer for this suggestion, indeed this was a big oversight!

      The paper is now quoted in the introduction as follows:

      “Previous studies have revealed increased responses in the superior temporal regions compared to normal feedback conditions (Hirano et al., 1997 ; Hashimoto & Sakai, 2003 ; Takaso et al., 2010 ; Ozerk et al., 2022 ; Floegel et al., 2020 ; see Meekings & Scott, 2021 for a review of error-monitoring and feedback control in the STG during speech production).”

      Furthermore, we updated the discussion part concerning the speaker-induced suppression phenomenon (see below our response to the point 10).

      (5) Line 125: "The parameters and sound adjustment were set using an external low-latency sound card (RME Babyface Pro Fs)". Can you please report the total feedback loop latency in your set-up? Or at the least cite the following paper which reports low latencies with this audio device.

      Kim, K. S., Wang, H., & Max, L. (2020). It's About Time: Minimizing Hardware and Software Latencies in Speech Research With Real-Time Auditory Feedback. Journal of Speech, Language, and Hearing Research, 63(8), 25222534. https://doi.org/10.1044/2020_JSLHR-19-00419

      We now report the total feedback loop latency (~5ms) and also cite the relevant paper (Kim et al., 2020).

      (6) Line 127 "A calibration was made to find a comfortable volume and an optimal balance for both the sound of the participant's own voice, which was fed back through the headphones, and the sound of the stimuli." What do you mean here by an 'optimal balance'? Was the participant's own voice always louder than the VP stimuli? Can you report roughly what you consider to be a comfortable volume in dB?

      This point was indeed unlcear. We have now changed as follows:

      “A calibration was made to find a comfortable volume and an optimal balance for both the sound of the participant's own voice, which was fed back through the headphones, and the sound of the stimuli. The aim of this procedure was that the patient would subjectively perceive their voice and the VP-voice in equal measure. VP voice was delivered at approximately 70dB.”

      (7) Relatedly, did you use any noise masking to mask the air-conducted feedback from their own voice (which would have been slightly out of phase with the feedback through the headphones, depending on your latency)?

      Considering the low-latency condition allowed with the sound card (RME Babyface Pro Fs), we did not use noise masking to mask the air-conducted feedback from the self-voice of the patients.

      (8) Line 141: "four short sentences were pre-recorded by a woman and a man." Did all participants synchronise with both the man and woman or was the VP gender matched to that of the participant/patient?

      We thank the reviewer for this important missing detail. We know changed the text as follows:

      “Four stimuli corresponding to four short sentences were pre-recorded by both a female and a male speaker. This allowed to adapt to the natural gender differences in fundamental frequency (i.e. so that the VP gender matched that of the patients). All stimuli were normalised in amplitude.”

      (9) Can you clarify what instructions participants were given regarding the VP? That is, were they told that this was a recording or a real live speaker? Were they naïve to the manipulation of the VP's coupling to the participant?

      We have now added this information to the task description as follows:

      “Participants, comfortably seated in a medical chair, were instructed that they would perform a real-time interactive synchronous speech task with an artificial agent (Virtual Partner, henceforth VP, see next section) that can modulate and adapt to the participant’s speech in real time.”

      “The third step was the actual experiment. This was identical to the training but consisted of 24 trials (14s long, speech rate ~3Hz, yielding ~1000 syllables). Importantly, the VP varied its coupling behaviour to the participant. More precisely, for a third of the sequences the VP had a neutral behaviour (close to zero coupling : k = +/- 0.01). For a third it had a moderate coupling, meaning that the VP synchronised more to the participant speech (k = - 0.09). And for the last third of the sequences the VP had a moderate coupling but with a phase shift of pi/2, meaning that it moderately aimed to speak in between the participant syllables (k = + 0.09). The coupling values were empirically determined on the basis of a pilot experiment in order to induce more or less synchronization, but keeping the phase-shifted coupling at a rather implicit level. In other terms, while participants knew that the VP would adapt, they did not necessarily know in which direction the coupling went.”  

      (10) The paragraph from line 438 entitled "Secondary auditory regions are more sensitive to coordinative behaviour" includes an interesting discussion of the relation of the current findings to the phenomenon of speech-induced suppression (SIS). However, the authors appear to equate the observed decrease in highfrequency activity as speech coordination increases with the phenomenon of SIS (in lines 456-457), which is quite a speculative leap. I would encourage the authors to temper this discussion by referring to SIS as a potentially related phenomenon, with a need for more experimental work to determine if this is indeed the same phenomenon as the decreases in high-frequency power observed here. I believe that the authors are arguing here for an interpretation of SIS as reflecting internal modelling of sensory input regardless of whether this is self-generated or other-generated; if this is indeed the case, I would ask the authors to be more explicit here that these ideas are not a standard part of the traditional account of SIS, which only includes internal modelling of self-produced sensory feedback.

      As stated in the public review, we thank both reviewers for raising thoughtful concerns about our interpretation of the observed neural suppression as related to speaker-induced suppression (SIS). We agree that our study lacks a passive listening condition, which limits direct comparisons to the original SIS effect, traditionally defined as the suppression of neural responses to self-produced speech compared to externally-generated speech (Meekings & Scott, 2021).

      In response, we have reconsidered our terminology and interpretation. In the revised discussion, we refer to our findings as a "SIS-related phenomenon specific to the synchronous speech context." Unlike classic SIS paradigms, our interactive task involves simultaneous monitoring of self- and externally-generated speech, introducing additional attentional and coordinative demands.

      The revised discussion also incorporates findings by Ozker et al. (2024, 2022), which link SIS and speech monitoring, suggesting that suppressing responses to self-generated speech facilitates error detection. We propose that the decrease in high-frequency activity (HFa) as verbal coordination increases reflects reduced error signals due to closer alignment between perceived and produced speech. Conversely, HFa increases with reduced coordination may signify greater prediction error.

      Additionally, we relate our findings to the "rubber voice" effect (Zheng et al., 2011; Lind et al., 2014; Franken et al., 2021), where temporally and phonetically congruent external speech can be perceived as self-generated. We speculate that this may occur in synchronous speech tasks when the participant's and VP's speech signals closely align. However, this interpretation remains speculative, as no subjective reports were collected to confirm this perception. Future studies could include participant questionnaires to validate this effect and relate subjective experience to neural measures of synchronization.

      Overall, our findings extend the study of SIS to dynamic, interactive contexts and contribute to understanding internal forward models of speech production in more naturalistic scenarios.

      We have now added these points to the discussion as follows:

      “The observed negative correlation between verbal coordination and high-frequency activity (HFa) in STG BA22 suggests a suppression of neural responses as the degree of synchrony increases. This result aligns with findings on speaker-induced suppression (SIS), where neural activity in auditory cortex decreases during self-generated speech compared to externally-generated speech (Meekings & Scott, 2021; Niziolek et al., 2013). However, our paradigm differs from traditional SIS studies in two critical ways: (1) the speaker's own voice is always present and predictable from the forward model, and (2) no passive listening condition was included. Therefore, our findings cannot be directly equated with the original SIS effect.

      Instead, we propose that the suppression observed here reflects a SIS-related phenomenon specific to the synchronous speech context. Synchronous speech requires simultaneous monitoring of self- and externally generated speech, a task that is both attentionally demanding and coordinative. This aligns with evidence from Ozker et al. (2024, 2022), showing that the same neural populations in STG exhibit SIS and heightened responses to feedback perturbations. These findings suggest that SIS and speech monitoring are related processes, where suppressing responses to self-generated speech facilitates error detection.

      In our study, suppression of HFa as coordination increases may reflect reduced prediction errors due to closer alignment between perceived and produced speech signals. Conversely, increased HFa during poor coordination may signify greater mismatch, consistent with prediction error theories (Houde & Nagarajan, 2011; Friston et al., 2020).”

      (11) Within this section, you also speculate in line 460 that "Moreover, when the two speech signals come close enough in time, the patient possibly perceives them as its own voice." I would recommend citing studies on the 'rubber voice' effect to back up this claim (e.g. Franken et al., 2021; Lind et al., 2014; Zheng et al., 2011).

      We are grateful to the Reviewer for this interesting suggestion. Directly following the previous comment, the section now states:

      “Furthermore, when self- and externally-generated speech signals are temporally and phonetically congruent, participants may perceive external speech as their own. This echoes the "rubber voice" effect, where external speech resembling self-produced feedback is perceived as self-generated (Zheng et al., 2011; Lind et al., 2014; Franken et al., 2021). While this interpretation remains speculative, future studies could incorporate subjective reports to investigate this phenomenon in more detail.”

      (12) As noted in my public review, since your methods are correlational, you need to be careful about inferring the causal role of any brain areas in supporting a specific aspect of functioning e.g. line 501-504: "By contrast, in the inferior frontal gyrus, the coupling in the high-frequency activity is strongest with the input-output phase difference (input of the VP - output of the speaker), a metric that reflects the amount of error in the internal computation to reach optimal coordination, which indicates that this region optimises the predictive and coordinative behaviour required by the task." I would argue that the latter part of this sentence is a conclusion that, although consistent with, goes beyond the current data in this study, and thus needs tempering.

      We agree with the Reviewer and changed the sentence as follows:

      “By contrast, in the inferior frontal gyrus, the coupling in the high-frequency activity is strongest with the inputoutput phase difference (input of the VP - output of the speaker), a metric that could possibly reflect the amount of error in the internal computation to reach optimal coordination. This indicates that this region could have an implication in the optimisation of the predictive and coordinative behaviour required by the task.”

    1. Author response:

      The following is the authors’ response to the original reviews

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors): 

      Recommendations  Analysis: 

      (1) Given that a MER21B/C LTR was not immediately identified at the start site of the Liz lncRNA in the mouse, and its match is only 46%, this raises the question of whether an analogous LTR would be identified at the homologous location in other species on deeper analysis. The authors need to argue that what has been conserved in the LTR alone in mouse is the essential element conferring the ability to initiate transcription of Liz. A transient reporter assay might be sufficient to do this. 

      We believe that the 46% identity between the first exon of mouse Liz and the consensus sequence of MER21C is so weak that its traces as MER21C are too attenuated to be detected by standard in silico analyses, such as homology searches. For instance, when pairwise alignments are performed between the first exon of mouse Liz and the consensus sequences of solo-LTRs other than MER21C, MER21C does not emerge as the most similar sequence (Figure 5 – figure supplement 1). This is in stark contrast to similar analyses involving the first exon of human and rabbit GPR1AS (which overlaps with MER21C), where MER21C is identified as the most similar sequence. [pages: 26, 31-32]

      The positions of these LTRs were initially annotated using RepeatMasker. To ensure robust analysis, we performed additional searches with RepeatMasker under more sensitive conditions, adjusting search engines (e.g., RMblast to HMMER or Cross-match) and sensitivity settings. Nevertheless, MER21C or closely related LTRs were still undetectable in mouse, rat, and hamster (Figure 4 – figure supplement 1). However, a multiple genome alignment generated by Cactus/UCSC revealed a syntenic region corresponding to the first exon of human GPR1-AS, overlapping with LTR21C, in the genomes of mice, as well as rats and hamsters (Figure 4 – figure supplement 2). Although RepeatMasker did not annotate MER21C at the GPR1 locus in these species, homologous regions were observed across all selected Euarchontoglires. Due to the limitations of the Cactus alignment track in delineating precise homologous boundaries across species, extracting sequences for evolutionary tree construction was not feasible. Nevertheless, these findings support the hypothesis that the first exon of GPR1-AS (Liz in mice) originated from a MER21C insertion in the common ancestor of Euarchontoglires. [pages: 21, 24-25]

      A combination of traditional annotation of repetitive elements using RepeatMasker and the reconstruction of ancestral genomes through multiple genome alignment can reveal highly degenerated LTR relics. This approach is likely to point to significant future directions for research. This point is further elaborated in the discussion section. [page 42]

      Furthermore, in response to the reviewer's suggestion, we investigated the promoter activity of the GPR1-AS and Liz first exons, which are hypothesized to have originated from the same MER21C insertion. Using a dual reporter assay, we demonstrated that the first exon of mouse Liz exhibits promoter activity in a human cell line comparable to that of the human GPR1-AS promoter. Thus, despite the relatively low sequence similarity between the Liz first exon and the MER21C consensus sequence (46% as determined by pairwise alignment, Figure 5 – figure supplement 2), the promoter activity remains functionally conserved. We further discuss the potential functional motifs within the putative MER21C LTR-derived sequences in Figure 4B-D. Taken together, these findings suggest that despite a high level of degeneracy of the promoter region in rodents, including mice, the most parsimonious explanation for the origin of this regulatory element in rodents is the presence of the same LTR relic detectable in humans/primates, which is essential for robust transcription initiation of Liz and GPR1-AS, respectively. [pages: 27, 32]

      (2) Imprinting will depend on an initiating mechanism in the germline, in addition to events in the embryo that induce the secondary DMR at ZDBF2. The authors should therefore examine as far as possible the presence of a gDMR in the species with/without GPR1-AS1 and ZDBF2 imprinting. Whole-genome bisulphite sequencing data from oocytes and sperm should exist for some of the relevant species (e.g., pig, cow: Ivanova et al. 2020 PMID: 32393379; Lu et al. 2012 PMID: 34818044). 

      As the reviewer noted, the presence of a gDMR is essential for the establishment of imprinting. Following another reviewer's suggestion, we have now demonstrated that the ZDBF2 gene in rhesus monkeys is also subject to imprinting (see Figure 3C-D). We also acquired whole genome bisulfite sequencing data for rhesus monkey sperm and oocytes, identified DMRs between them, and discovered an oocyte-specifically methylated gDMR in the first exon of GPR1-AS (which overlaps with MER21C)(Figure 3 – figure supplement 1A). This finding is consistent with observations in humans and mice. Conversely, we obtained similar sequencing data for porcine and bovine sperm and oocytes and conducted the same analysis (Figure 3 – figure supplement 1A,B). However, we did not detect any oocyte-specific methylated gDMRs in the GPR1 intragenic region (where GPR1AS is transcribed from an intron of GPR1) in these species of the Laurasiatheria superorder. These results support the hypothesis that ZDBF2 is not imprinted in lineages outside the Euarchontoglires, the superorder which includes both rodents and primates. We have included these important DMR results as a supplement to Figure 3. [pages 16-21]

      Presentation: 

      (1) The first section of the Introduction would benefit from the inclusion of some additional general references on genomic imprinting. 

      We have added two review articles, Tucci et al. (2019) and Kobayashi (2021), as references in the first section of the Introduction. [page 5]

      (2) Introduction statement: "....nearly 200 imprinted genes have been identified in mice and humans. However, less than half of these genes overlapped in both species." This was the conclusion of one study (Tucci et al. 2016), so it would be better to provide a caveat to the statement "However, one comparative analysis suggested that fewer than half of these genes overlapped in both species". 

      The point being that the actual number of imprinted genes is still a matter of debate (see Edwards et al. 2023 PMID: 36916665), and the extent of overlap will depend on the strength of the evidence for each gene in the human and mouse imprinted gene lists. So, it is very difficult to put an accurate figure on the extent of overlap - but the authors' point is valid that there are species- or lineage-specific imprinted genes. 

      We have revised this sentence following reviewer #1's suggestion. [page 5]

      (3) Introduction statement: "The establishment of species-specific imprinting.....can be driven by various evolutionary events, including.....differences in the function of DNA methyltransferases". I am not aware that this has been described as an evolutionary event causing species-specific imprinting - without supporting evidence, I recommend to remove this suggestion. 

      We thank the reviewer for this comment and realize that we should have been more explicit here. We were referring to DNMT3C, a rodent-specific member of the DNMT3 family, which is responsible for the paternal methylation imprinting of Rasgrf1 in mice (Barau et al., Science, 2016), in association with the piRNA pathway and targeting of a specific retrotransposon within the DMR (Watanabe et al. Science, 2011). The Rasgrf1 gene is imprinted in mice but not considered imprinted in humans (though some conflicting data exist). While it is likely that the emergence of DNMT3C was a pre-requisite to the establishment of Rasgrf1 imprinting in evolutionary terms, clear evidence is lacking. Following the reviewer’s suggestion, we have removed the phrase "differences in the function of DNA methyltransferases" from the text. However, we have reintroduced this point in the Introduction section as a potential mechanism that may contribute to the establishment of species-specific imprinted genes, alongside the roles of ZNF445 and ZFP57, which regulate the maintenance of imprinting with partially divided roles between humans and mice. [page 6]

      (4) It would be very useful for readers to have a schema of the Gpr1/Zdbf2 locus that indicates the locations of the germline and somatic DMRs and their relationship to the Liz transcript. 

      (5) There is a summary figure amongst the Supplementary Figures (Suppl. Fig. 7) - it would be beneficial to readers to have this summary figure in the main text rather than the supplement. 

      Following reviewer #1’s suggestion, we have moved the regulatory system schema at the Gpr1/Zdbf2 locus, originally shown in Supplementary Figure 7, to the main text as Figure 7. In addition, in response to comment 4, we have revised the figure to explicitly depict the relationship between the Liz transcript and the establishment of the somatic DMR (sDMR), enhancing the clarity of the regulatory interactions at this locus. [page 38]

      (6) With a focus of the study on LTRs as cis-regulatory elements having been co-opted in genomic imprinting mechanisms - whether in the female germline (as in Bogutz et al. 2019) or in the current study as an activating element post-fertilisation - it is a real omission that the authors do not to refer to the role of tissue-specific LTRs as the candidate regulatory elements in non-canonical imprinting (see Hanna et al. 2019 PMID: 31665063). Please include in Introduction and/or Discussion. 

      We added a sentence explaining canonical and non-canonical imprinting and the cases where LTRs act as regulatory elements in non-canonical imprinting, with reference to the study of Hanna et al., as suggested. [page 6]

      (7) Discussion statement: "Two paternally expressed imprinted genes, PEG10/SIRH1 and PEG11/RTL1/SIRH2 have been identified in mammals. They encode GAG-POL proteins of sushi-ichi LTR retrotransposons and are essential for mammalian placenta formation and maintenance." 

      These sentences should be combined: "Two paternally expressed imprinted genes, PEG10/SIRH1, and PEG11/RTL1/SIRH2, that encode GAG-POL proteins of sushi-ichi LTR retrotransposons have been identified in mammals and are essential for mammalian placenta formation and maintenance." 

      We have revised this sentence according to reviewer #1's suggestion. [page 41]

      Reviewer #2 (Recommendations For The Authors): 

      When showing assembled GPR1-AS transcripts via genome browser tracks, it would be valuable to add normalized counts of reads mapping to each strand, in order to more convincingly demonstrate the existence of these transcripts. I ask for this because in my experience Stringtie will assemble transcripts that are only marginally supported by reads. 

      In response to Reviewer #2's suggestion, FPKM and TPM values for all StringTiepredicted GPR1-AS-like transcripts are now included in Figure 6. Each of these transcripts has a TPM value greater than 1, supporting their validity. [pages: 35]

      Reviewer #3 (Recommendations For The Authors): 

      (1) The tree in Figure 5A is one of the main arguments supporting the divergence of the mouse Liz promoter from a common MER21C element, but this contains only a handful of species, making it difficult to appreciate the full extent of its evolution. Presumably its faster mutation rate in mouse would also be supported by other closely related rodents, which would solidify the conclusion that the Liz promoter is derived from an ancient MER21C insertion. So my suggestion is to expand this tree substantially to other species, comparing sequences syntenic to the GPR1-AS/Liz promoter. 

      (2) It may also be worth trying different TE/LTR annotation tools and/or running Repeatmasker with different parameters, to see if an MER21C element is detected in mouse using a more sensitive approach. 

      In response to this suggestion, we performed computational analyses with RepeatMasker under various settings (e.g., switching search engines from RMblast to HMMER or Crossmatch, adjusting speed/sensitivity settings from default to slow). Despite these modifications, a MER21C element was not detected near the mouse Liz promoter. However, a multiple genome alignment track generated by Cactus/UCSC revealed a syntenic region, corresponding to the first exon of human GPR1-AS, which overlaps with LTR21C, also present in the genomes of mouse, rat, and hamster (Figure 4 – figure supplement 1). While RepeatMasker did not identify MER21C at the GPR1 locus in these species, homologous regions were observed across all selected Euarchontoglires. Although the Cactus alignment track does not delineate the exact boundaries of homologous regions across species (relative to humans) and thus precludes extracting each homologous sequence to construct an evolutionary tree, these findings support the hypothesis that the first exon of GPR1-AS (referred to as Liz in mice) originated from an ancient MER21C insertion in the common ancestor of Euarchontoglires. [pages: 21, 24-25]

      (3) According to Dfam, MER21C is not common to all eutherians, but specific to Boroeutheria, whilst MER21B is presumably specific to Euarchontoglires. To clarify MER21C/B evolution, it would be useful to show the number of elements present in select species from each group (including an outgroup). 

      (7) In Figure 4 it is hard to distinguish between red and purple. 

      Initially, we referenced Repbase (e.g., MER21C: Origin/Eutheria), but, as Reviewer #3 noted, Dfam should be the primary reference. We have now included the copy numbers of MER21C and MER21B for each genome in Figure 4, providing a clearer understanding of their evolutionary appearance (MER21C appears specific to Boroeutheria, while MER21B is specific to Euarchontoglires). Additionally, we adjusted the MER21B position color from purple to dark purple to improve visibility. Furthermore, we have also underlined the copy number of MER21C or MER21B located within the GPR1 region in each species. For example, in the Treeshrew genome, the LTR overlapping with GPR1-AS is annotated as MER21B, so we underlined the copy number of MER21B (2,305). These changes now clearly indicate whether homologous sequences to the first exon of GPR1-AS are annotated as MER21C or MER21B in each genome. [page 22]

      (4) Could the imprinting status of ZDBF2 not be determined in chimpanzees and rabbits? Or is it already known? Either way, a clarification would be useful to further support the concordance between GPR1-AS-like transcripts and ZDBF2 imprinting.

      The imprinting status of ZDBF2 had not previously been reported in chimpanzees, rhesus macaques, or rabbits, where GPR1-AS-like transcripts were identified. Therefore, we conducted allele-specific expression analysis of ZDBF2 using blood samples from rhesus macaques and rabbits. As expected, paternal-allele-specific expression of ZDBF2 was observed in both species, consistent with findings in humans and mice. These results have been added to Figure 3. Although we did not analyze the imprinting status in chimpanzees, we believe the existing data sufficiently support our hypothesis. [pages: 16, 19-20]

      (5) The authors briefly discuss the role of KRAB-ZFPs in controlling TE expression. An interesting addition would be to analyse the expression of the main KRAB-ZFP that binds to MER21C (ZFP789, according to data from PMID 28273063). This could be linked to the temporal control of MER21C expression. 

      In response to Reviewer #3's suggestion, we focused on the expression pattern of ZNF789 (noted by the reviewer as ZFP789), the primary KRAB-ZFP known to bind MER21C, as identified by Didier Trono’s group (PMID 28273063). Strikingly, our analysis reveals that ZNF789 is specifically downregulated at the 4-cell stage, which aligns with the timing of MER21C reactivation. While it remains to be determined whether this downregulation directly influences MER21C reactivation or the initiation of GPR1-AS expression, this finding is significant and consistent with our model. We have incorporated this information in Figure 5 – figure supplement 3. [pages: 33]

      (6) The sentence "Liz directs DNA methylation at the somatic DMR, which competes with ZDBF2 to repress the paternal allele" (introduction) was confusing to me. 

      This sentence has been revised to be more accurate as follows; Liz transcription counteracts the H3K27me3-mediated repression of Zdbf2 by promoting the deposition of antagonistic DNA methylation at the secondary DMR. [page 7]

      (8) In Figure 5 I take it that 'consensus motif' refers to ELF1/2? Maybe change the legend. 

      To clarify potential confusion around the term 'consensus motif,' which may have been mistaken for 'consensus MER21C' (the consensus sequence of MER21C-LTR from the Dfam database), we have revised the figure legend. We now refer to the motif as the "common motif," indicating the sequence common to all MER21C-derived sequences and overlapping with the first exon of GPR1-AS. [page 29]

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      Glaser et al present ExA-SPIM, a light-sheet microscope platform with large volumetric coverage (Field of view 85mm^2, working distance 35mm), designed to image expanded mouse brains in their entirety. The authors also present an expansion method optimized for whole mouse brains and an acquisition software suite. The microscope is employed in imaging an expanded mouse brain, the macaque motor cortex, and human brain slices of white matter. 

      This is impressive work and represents a leap over existing light-sheet microscopes. As an example, it offers a fivefold higher resolution than mesoSPIM (https://mesospim.org/), a popular platform for imaging large cleared samples. Thus while this work is rooted in optical engineering, it manifests a huge step forward and has the potential to become an important tool in the neurosciences. 

      Strengths: 

      - ExA-SPIM features an exceptional combination of field of view, working distance, resolution, and throughput. 

      - An expanded mouse brain can be acquired with only 15 tiles, lowering the burden on computational stitching. That the brain does not need to be mechanically sectioned is also seen as an important capability. 

      - The image data is compelling, and tracing of neurons has been performed. This demonstrates the potential of the microscope platform. 

      Weaknesses: 

      - There is a general question about the scaling laws of lenses, and expansion microscopy, which in my opinion remained unanswered: In the context of whole brain imaging, a larger expansion factor requires a microscope system with larger volumetric coverage, which in turn will have lower resolution (Figure 1B). So what is optimal? Could one alternatively image a cleared (non-expanded) brain with a high-resolution ASLM system (Chakraborty, Tonmoy, Nature Methods 2019, potentially upgraded with custom objectives) and get a similar effective resolution as the authors get with expansion? This is not meant to diminish the achievement, but it was unclear if the gains in resolution from the expansion factor are traded off by the scaling laws of current optical systems. 

      Paraphrasing the reviewer: Expanding the tissue requires imaging larger volumes and allows lower optical resolution. What has been gained?

      The answer to the reviewer’s question is nuanced and contains four parts. 

      First, optical engineering requirements are more forgiving for lenses with lower resolution. Lower resolution lenses can have much larger fields of view (in real terms: the number of resolvable elements, proportional to ‘etendue’) and much longer working distances. In other words, it is currently more feasible to engineer lower resolution lenses with larger volumetric coverage, even when accounting for the expansion factor. 

      Second, these lenses are also much better corrected compared to higher resolution (NA) lenses. They have a flat field of view, negligible pincushion distortions, and constant resolution across the field of view. We are not aware of comparable performance for high NA objectives, even when correcting for expansion.

      Third, although clearing and expansion render tissues ‘transparent’, there still exist refractive index inhomogeneities which deteriorate image quality, especially at larger imaging depths. These effects are more severe for higher optical resolutions (NA), because the rays entering the objective at higher angles have longer paths in the tissue and will see more aberrations. For lower NA systems, such as ExaSPIM, the differences in paths between the extreme and axial rays are relatively small and image formation is less sensitive to aberrations. 

      Fourth, aberrations are proportional to the index of refraction inhomogeneities (dn/dx). Since the index of refraction is roughly proportional to density, scattering and aberration of light decreases as M^3, where M is the expansion factor. In contrast, the imaging path length through the tissue only increases as M. This produces a huge win for imaging larger samples with lower resolutions. 

      To our knowledge there are no convincing demonstrations in the literature of diffraction-limited ASLM imaging at a depth of 1 cm in cleared mouse brain tissue, which would be equivalent to the ExA-SPIM imaging results presented in this manuscript.  

      In the discussion of the revised manuscript we discuss these factors in more depth. 

      - It was unclear if 300 nm lateral and 800 nm axial resolution is enough for many questions in neuroscience. Segmenting spines, distinguishing pre- and postsynaptic densities, or tracing densely labeled neurons might be challenging. A discussion about the necessary resolution levels in neuroscience would be appreciated. 

      We have previously shown good results in tracing the thinnest (100 nm thick) axons over cm scales with 1.5 um axial resolution. It is the contrast (SNR) that matters, and the ExaSPIM contrast exceeds the block-face 2-photon contrast, not to mention imaging speed (> 10x).  

      Indeed, for some questions, like distinguishing fluorescence in pre- and postsynaptic structures, higher resolutions will be required (0.2 um isotropic; Rah et al Frontiers Neurosci, 2013). This could be achieved with higher expansion factors.

      This is not within the intended scope of the current manuscript. As mentioned in the discussion section, we are working towards ExA-SPIM-based concepts to achieve better resolution through the design and fabrication of a customized imaging lens that maintains a high volumetric coverage with increased numerical aperture.  

      - Would it be possible to characterize the aberrations that might be still present after whole brain expansion? One approach could be to image small fluorescent nanospheres behind the expanded brain and recover the pupil function via phase retrieval. But even full width half maximum (FWHM) measurements of the nanospheres' images would give some idea of the magnitude of the aberrations. 

      We now included a supplementary figure highlighting images of small axon segments within distal regions of the brain.  

      Reviewer #2 (Public Review): 

      Summary: 

      In this manuscript, Glaser et al. describe a new selective plane illumination microscope designed to image a large field of view that is optimized for expanded and cleared tissue samples. For the most part, the microscope design follows a standard formula that is common among many systems (e.g. Keller PJ et al Science 2008, Pitrone PG et al. Nature Methods 2013, Dean KM et al. Biophys J 2015, and Voigt FF et al. Nature Methods 2019). The primary conceptual and technical novelty is to use a detection objective from the metrology industry that has a large field of view and a large area camera. The authors characterize the system resolution, field curvature, and chromatic focal shift by measuring fluorescent beads in a hydrogel and then show example images of expanded samples from mouse, macaque, and human brain tissue. 

      Strengths: 

      I commend the authors for making all of the documentation, models, and acquisition software openly accessible and believe that this will help assist others who would like to replicate the instrument. I anticipate that the protocols for imaging large expanded tissues (such as an entire mouse brain) will also be useful to the community. 

      Weaknesses: 

      The characterization of the instrument needs to be improved to validate the claims. If the manuscript claims that the instrument allows for robust automated neuronal tracing, then this should be included in the data. 

      The reviewer raises a valid concern. Our assertion that the resolution and contrast is sufficient for robust automated neuronal tracing is overstated based on the data in the paper. We are hard at work on automated tracing of datasets from the ExA-SPIM microscope. We have demonstrated full reconstruction of axonal arbors encompassing >20 cm of axonal length.  But including these methods and results is out of the scope of the current manuscript. 

      The claims of robust automated neuronal tracing have been appropriately modified.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Smaller questions to the authors: 

      - Would a multi-directional illumination and detection architecture help? Was there a particular reason the authors did not go that route?

      Despite the clarity of the expanded tissue, and the lower numerical aperture of the ExA-SPIM microscope, image quality still degrades slightly towards the distal regions of the brain relative to both the excitation and detection objective. Therefore, multi-directional illumination and detection would be advantageous. Since the initial submission of the manuscript, we have undertaken re-designing the optics and mechanics of the system. This includes provisions for multi-directional illumination and detection. However, this new design is beyond the scope of this manuscript. We now mention this in L254-255 of the Discussion section.

      - Why did the authors not use the same objective for illumination and detection, which would allow isotropic resolution in ASLM? 

      The current implementation of ASLM requires an infinity corrected objective (i.e. conjugating the axial sweeping mechanism to the back focal plane). This is not possible due to the finite conjugate design of the ExA-SPIM detection lens.

      More fundamentally, pushing the excitation NA higher would result in a shorter light sheet Rayleigh length, which would require a smaller detection slit (shorter exposure time, lower signal to noise ratio). For our purposes an excitation NA of 0.1 is an excellent compromise between axial resolution, signal to noise ratio, and imaging speed. 

      For other potentially brighter biological structures, it may be possible to design a custom infinity corrected objective that enables ASLM with NA > 0.1.

      - Have the authors made any attempt to characterize distortions of the brain tissue that can occur due to expansion? 

      We have not systematically characterized the distortions of the brain tissue pre and post expansion. Imaged mouse brain volumes are registered to the Allen CCF regardless of whether or not the tissue was expanded. It is beyond the scope of this manuscript to include these results and processing methods, but we have confirmed that the ExA-SPIM mouse brain volumes contain only modest deformation that is easily accounted for during registration to the Allen CCF. 

      - The authors state that a custom lens with NA 0.5-0.6 lens can be designed, featuring similar specifications. Is there a practical design? Wouldn't such a lens be more prone to Field curvature? 

      This custom lens has already been designed and is currently being fabricated. The lens maintains a similar space bandwidth product as the current lens (increased numerical aperture but over a proportionally smaller field of view). Over the designed field of view, field curvature is <1 µm. However, including additional discussion or results of this customized lens is beyond the scope of this manuscript.

      Reviewer #2 (Recommendations For The Authors): 

      System characterization: 

      - Please state what wavelength was used for the resolution measurements in Figure 2.

      An excitation wavelength of 561 nm was used. This has been added to the manuscript text.

      - The manuscript highlights that a key advance for the microscope is the ability to image over a very large 13 mm diameter field of view. Can the authors clarify why they chose to characterize resolution over an 8diameter mm field rather than the full area? 

      The 13 mm diameter field of view refers to the diagonal of the 10.6 x 8.0 mm field of view. The results presented in Figure 1c are with respect to the horizontal x direction and vertical y direction. A note indicating that the 13 mm is with respect to the diagonal of the rectangular imaging field has been added to the manuscript text. The results were presented in this way to present the axial and lateral resolution as a function of y (the axial sweeping direction).

      - The resolution estimates seem lower than I would expect for a 0.30 NA lens (which should be closer to ~850 nm for 515 nm emission). Could the authors clarify the discrepancy? Is this predicted by the Zemax model and due to using the lens in immersion media, related to sampling size on the camera, or something else? It would be helpful if the authors could overlay the expected diffraction-limited performance together with the plots in Figure 2C. 

      As mentioned previously, the resolution measurements were performed with 561 nm excitation and an emission bandpass of ~573 – 616 nm (595 nm average). Based on this we would expect the full width half maximum resolution to be ~975 nm. The resolution is in fact limited by sampling on the camera. The 3.76 µm pixel size, combined with the 5.0X magnification results in a sampling of 752 nm. Based on the Nyquist the resolution is limited to ~1.5 µm. We have added clarifying statements to the text.

      - I'm confused about the characterization of light sheet thickness and how it relates to the measured detection field curvature. The authors state that they "deliver a light sheet with NA = 0.10 which has a width of 12.5 mm (FWHM)." If we estimate that light fills the 0.10 NA, it should have a beam waist (2wo) of ~3 microns (assuming Gaussian beam approximations). Although field curvature is described as "minimal" in the text, it is still ~10-15 microns at the edge of the field for the emission bands for GFP and RFP proteins. Given that this is 5X larger than the light sheet thickness, how do the authors deal with this? 

      The generated light sheet is flat, with a thickness of ~ 3 µm. This flat light sheet will be captured in focus over the depth of focus of the detection objective. The stated field curvature is within 2.5X the depth of focus of the detection lens, which is equivalent to the “Plan” specification of standard microscope objectives.

      - In Figure 2E, it would be helpful if the authors could list the exposure times as well as the total voxels/second for the two-camera comparison. It's also worth noting that the Sony chip used in the VP151MX camera was released last year whereas the Orca Flash V3 chosen for comparison is over a decade old now. I'm confused as to why the authors chose this camera for comparison when they appear to have a more recent Orca BT-Fusion that they show in a picture in the supplement (indicated as Figure S2 in the text, but I believe this is a typo and should be Figure S3). 

      This is a useful addition, and we have added exposure times to the plot. We have also added a note that the Orca Flash V3 is an older generation sCMOS camera and that newer variants exist. Including the Orca BT-Fusion. The BT-Fusion has a read noise of 1.0 e- rms versus 1.6 e- rms, and a peak quantum efficiency of ~95% vs. 85%. Based on the discussion in Supplementary Note S1, we do not expect that these differences in specifications would dramatically change the data presented in the plot. In addition, the typo in Figure S2 has been corrected to Figure S3.

      - In Table S1, the authors note that they only compare their work to prior modalities that are capable of providing <= 1 micron resolution. I'm a bit confused by this choice given that Figure 2 seems to show the resolution of ExA-SPIM as ~1.5 microns at 4 mm off center (1/2 their stated radial field of view). It also excludes a comparison with the mesoSPIM project which at least to me seems to be the most relevant prior to this manuscript. This system is designed for imaging large cleared tissues like the ones shown here. While the original publication in 2019 had a substantially lower lateral resolution, a newer variant, Nikita et al bioRxiv (which is cited in general terms in this manuscript, but not explicitly discussed) also provides 1.5-micron lateral resolution over a comparable field of view. 

      We have updated the table to include the benchtop mesoSPIM from Nikita et al., Nature Communications, 2024. Based on this published version of the manuscript, the lateral resolution is 1.5 µm and axial resolution is 3.3 µm. Assuming the Iris 15 camera sensor, with the stated 2.5 fps, the volumetric rate (megavoxels/sec) is 37.41.

      - The authors state that, "We systematically evaluated dehydration agents, including methanol, ethanol, and tetrahydrofuran (THF), followed by delipidation with commonly used protocols on 1 mm thick brain slices. Slices were expanded and examined for clarity under a macroscope." It would be useful to include some data from this evaluation in the manuscript to make it clear how the authors arrived at their final protocol. 

      Additional details on the expansion protocol may be included in another manuscript.

      General comments: 

      There is a tendency in the manuscript to use negative qualitative terms when describing prior work and positive qualitative terms when describing the work here. Examples include: 

      - "Throughput is limited in part by cumbersome and error-prone microscopy methods". While I agree that performing single neuron reconstructions at a large scale is a difficult challenge, the terms cumbersome and error-prone are qualitative and lacking objective metrics.

      We have revised this statement to be more precise, stating that throughput is limited in part by the speed and image quality of existing microscopy methods.

      - The resolution of the system is described in several places as "near-isotropic" whereas prior methods were described as "highly anisotropic". I agree that the ~1:3 lateral to axial ratio here is more isotropic than the 1:6 ratio of the other cited publications. However, I'm not sure I'd consider 3-fold worse axial resolution than lateral to be considered "near" isotropic.

      We agree that the term near-isotropic is ambiguous. We have modified the text accordingly, removing the term near-isotropic and where appropriate stating that the resolution is more isotropic than that of other cited publications.

      - In the manuscript, the authors describe the photobleaching in their imaging conditions as "negligible". Figure S5 seems to show a loss of 60% fluorescence after 2000 exposures (which in the caption is described as "modest"). I'd suggest removing these qualitative terms and just stating the values.

      We agree and have changed the text accordingly.

      - The results section for Figure 5 is titled "Tracing axons in human neocortex and white matter". Although this section states "larger axons (>1 um) are well separated... allowing for robust automated and manual tracing" there is no data for any tracing in the manuscript. Although I agree that the images are visually impressive, I'm not sure that this claim is backed by data.

      We have now removed the text in this section referring to automated and manual tracing.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this paper Weber et al. investigate the role of 4 dopaminergic neurons of the Drosophila larva in mediating the association between an aversive high-salt stimulus and a neutral odor. The 4 DANs belong to the DL1 cluster and innervate non-overlapping compartments of the mushroom body, distinct from those involved in appetitive associative learning. Using specific driver lines for individual neurons, the authors show that activation of the DAN-g1 is sufficient to mimic an aversive memory and it is also necessary to form a high-salt memory of full strength, although optogenetic silencing of this neuron has only a partial phenotype. The authors use calcium imaging to show that the DAN-g1 is not the only DAN responding to salt. DAN-c1 and d1 also respond to salt, but they seem to play no role for the associative memory. DAN-f1, which does not respond to salt, is able to lead to the formation of a memory (if optogenetically activated), but it is not necessary for the salt-odor memory formation in normal conditions. However, when silenced together with DAN-g1, it enhances the memory deficit of DAN-g1. Overall, this work brings evidence of a complex interaction between DL1 DANs in both the encoding of salt signals and their teaching role in associative learning, with none of them being individually necessary and sufficient for both functions.

      Strengths:

      Overall, the manuscript contributes interesting results that are useful to understand the organization and function of the dopaminergic system. The behavioral role of the specific DANs is accessed using specific driver lines which allow to test their function individually and in pairs. Moreover, the authors perform calcium imaging to test whether DANs are activated by salt, a prerequisite for inducing a negative association to it. Proper genetic controls are carried across the manuscript.

      Weaknesses:

      The authors use two different approaches to silence dopaminergic neurons: optogenetics and induction of apoptosis. The results are not always consistent, but the authors discuss these differences appropriately. In general, the optogenetic approach is more appropriate as developmental compensations are not of major interest for the question investigated.

      The physiological data would suggest the role of a certain subset of DANs in salt-odor association, but a different partially overlapping set is necessary in behavioral assays (with a partial phenotype). No manipulation completely abolishes the salt-odor association, leaving important open questions on the identity of the neural circuits involved in this behavior.

      The EM data analysis reveals a non-trivial organization of sensory inputs into DANs, but it is difficult to extrapolate a link to the functional data presented in the paper.

      We would like to once again thank Reviewer 1 for the positive assessment of our work and for the valuable suggestions provided on the first revision of the manuscript. In this second revision, we have addressed the linguistic issues and most of the minor comments as recommended. We now hope that the current version of our manuscript meets the reviewer’s expectations both in terms of language and content.

      Reviewer #2 (Public review):

      Summary:

      In this work the authors show that dopaminergic neurons (DANs) from the DL1 cluster in Drosophila larvae are required for the formation of aversive memories. DL1 DANs complement pPAM cluster neurons which are required for the formation of attractive memories. This shows the compartmentalized network organization of how an insect learning center (the mushroom body) encodes memory by integrating olfactory stimuli with aversive or attractive teaching signals. Interestingly, the authors found that the 4 main dopaminergic DL1 neurons act partially redundant, and that single cell ablation did not result in aversive memory defects. However, ablation or silencing of a specific DL1 subset (DAN-f1,g1) resulted in reduced salt aversion learning, which was specific to salt but no other aversive teaching stimuli tested. Importantly, activation of these DANs using an optogenetic approach was also sufficient to induce aversive learning in the presence of high salt. Together with the functional imaging of salt and fructose responses of the individual DANs and the implemented connectome analysis of sensory (and other) inputs to DL1/pPAM DANs this represents a very comprehensive study linking the structural, functional and behavioral role of DL1 DANs. This provides fundamental insight into the function of a simple yet efficiently organized learning center which displays highly conserved features of integrating teaching signals with other sensory cues via dopaminergic signaling.

      Strengths:

      This is a very careful, precise and meticulous study identifying the main larval DANs involved in aversive learning using high salt as a teaching signal. This is highly interesting because it allows to define the cellular substrates and pathways of aversive learning down to the single cell level in a system without much redundancy. It therefore sets the basis to conduct even more sophisticated experiments and together with the neat connectome analysis opens the possibility to unravel different sensory processing pathways within the DL1 cluster and integration with the higher order circuit elements (Kenyon cells and MBONs). The authors' claims are well substantiated by the data and balanced, putting their data in the appropriate context. The authors also implemented neat pathway analyses using the larval connectome data to its full advantage, thus providing network pathways that contribute towards explaining the obtained results.

      Weaknesses:

      Previous comments were fully addressed by the authors.

      We sincerely thank Reviewer 2 for the positive evaluation of our work. We are glad that our responses in the first revision addressed the previous concerns and appreciate the reviewer’s constructive feedback once again.

      Reviewer #3 (Public review):

      Summary:

      The study of Weber et al. provides a thorough investigation of the roles of four individual dopamine neurons for aversive associative learning in the Drosophila larva. They focus on the neurons of the DL-1 cluster which already have been shown to signal aversive teaching signals. But the authors go beyond the previous publications and test whether each of these dopamine neurons responds to salt or sugar, is necessary for learning about salt, bitter, or sugar, and is sufficient to induce a memory when optogenetically activated. In addition, previously published connectomic data is used to analyze the synaptic input to each of these dopamine neurons. The authors conclude that the aversive teaching signal induced by salt is distributed across the four DL-1 dopamine neurons, with two of them, DAN-f1 and DAN-g1, being particularly important. Overall, the experiments are well designed and performed, support the authors' conclusions, and deepen our understanding of the dopaminergic punishment system.

      Strengths:

      (1) This study provides, at least to my knowledge, the first in vivo imaging of larval dopamine neurons in response to tastants. Although the selection of tastants is limited, the results close an important gap in our understanding of the function of these neurons.

      (2) The authors performed a large number of experiments to probe for the necessity of each individual dopamine neuron, as well as combinations of neurons, for associative learning. This includes two different training regimen (1 or 3 trials), three different tastants (salt, quinine and fructose) and two different effectors, one ablating the neuron, the other one acutely silencing it. This thorough work is highly commendable, and the results prove that it was worth it. The authors find that only one neuron, DAN-g1, is partially necessary for salt learning when acutely silenced, whereas a combination of two neurons, DAN-f1 and DAN-g1, are necessary for salt learning when either being ablated or silenced.

      (3) In addition, the authors probe whether any of the DL-1 neurons is sufficient for inducing an aversive memory. They found this to be the case for two of the neurons, largely confirming previous results obtained by a different learning paradigm, parameters and effector.

      (4) This study also takes into account connectomic data to analyze the sensory input that each of the dopamine neurons receives. This analysis provides a welcome addition to previous studies and helps to gain a more complete understanding. The authors find large differences in inputs that each neuron receives, and little overlap in input that the dopamine neurons of the "aversive" DL-1 cluster and the "appetitive" pPAM cluster seem to receive.

      (5) Finally, the authors try to link all the gathered information in order to describe an updated working model of how aversive teaching signals are carried by dopamine neurons to the larva's memory center. This includes important comparisons both between two different aversive stimuli (salt and nociception) and between the larval and adult stages.

      We would also like to thank Reviewer 3 for the positive assessment of our work. Many of the constructive comments provided were incorporated into the first revision, contributing significantly to the improved clarity and overall quality of the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Here are some minor comments (and some semantics that could be addressed to improve the manuscript)

      Title: is the title correct given that c1 and d1 do not really signal punishment?

      We think the title is correct and would like to keep it as it is.

      L72 striatum misspelled

      We have corrected the error.

      L74 constitute instead of provide?

      We made the suggested modification in the text.

      L129: "But can these four individual DANs also process other sensory modalities?" other then what? What was used before?

      We have made the required change, which now allows us to contrast somatosensory and chemosensory information.

      L172: (Please refer to the discussion regarding the partial reduction of the memory); would be more natural to explain shortly here, or add a sentence before this parenthesis that point to the effect

      We made the requested change in the manuscript and added a short sentence before the parenthesis.

      L182: "DL1 neurons convey a dopaminergic aversive teaching signal" you cannot make this statement from just TH-GAL4!

      We agree - that's why we have completely revised the sentence and now further restricted it and also refer to further larval and adult published data

      L264: "possible redundancy among" I don't think you are testing a redundancy here, it is more likely a developmental compensation.

      We made the requested change in the sentence and added a potential developmental compensation as an interpretation of our results.

      L296: "to determine if the activation of individual DL1 DANs signals aspects of the natural high salt punishment," - how can the optogenetic activation tell something about aspects of the natural salt punishment? I understand the fact that salt is present, but still I find it inaccurate

      Our approach is based on the framework established by Bertram Gerber and colleagues over the past two decades in larval Drosophila research. According to this logic, memory recall is dependent on the specific properties of the test context, particularly the type and concentration of the stimulus presented on the test plate. Aversive memory retrieval occurs only when the test conditions closely match those of the training stimulus. Consequently, the larva's behavior on the test plate serves as an indicator of the memory content being recalled. We therefore adhere to this established methodology (Gerber & Hendel, 2006; Schleyer et al., 2011; Schleyer et al., 2015).

      L307 "DAN-f1 and DAN-g1 encode aspects of the natural aversive high salt teaching" you cannot conclude that given that f1 does not even respond to salt. I understand the logic of the salt during test, but I think it is still a stretched interpretation

      We agree and thus have deleted the sentence.

      L310 "Individual DL1 DANs are acutely necessary" this is too general, it seems that only one is

      We have changed the title and now clearly state that this is only one DAN of the DL1 cluster.

      Reviewer #2 (Recommendations for the authors):

      In Fig.6 the text flow could be optimized as the authors first mention Fig. 6E,F before they follow up with Fig. 6A-D.

      Thanks for bringing this up – we changed it in the revised version of the manuscript. Now 6A-D is mentioned first.

      In Fig.6 the finding that optogenetic inactivation but not ablation of DAN-g1 slightly but significantly reduces aversive salt learning suggests that there is an individual contribution of this DAN in this paradigm. The authors emphasize redundancy of DL1 DANs although the effect size seems comparable between DAN-g1 and DAN-f1,g1 silencing.

      In response to this concern and the one of reviewer 2, we have revised the section title and removed the final sentence of the section before to avoid placing emphasis on the potential redundancy of DL1 DANs within this results section.

      Reviewer #3 (Recommendations for the authors):

      The authors replied to each issue I raised, and revised their manuscript accordingly. In particular, regarding my major concern (the sufficiency of the neurons for salt-"specific" memories), I think the authors found a good solution.

      I have no further comments.

      We sincerely thank the reviewer for the positive feedback on our revision. We are pleased that the revised manuscript meets the expectations and appreciate the time and effort invested in the review process.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In Causal associations between plasma proteins and prostate cancer: a Proteome-Wide Mendelian Randomization, the authors present a manuscript which seeks to identify novel markers for prostate cancer through analysis of large biobank-based datasets and to extend this analysis to potential therapeutic targets for drugs. This is an area that is already extensively researched, but remains important, due to the high burden and mortality of prostate cancer globally.

      Strengths:

      The main strengths of the manuscript are the identification and use of large biobank data assets, which provide large numbers of cases and controls, essential for achieving statistical power. The databases used (deCODE, FinnGen, and the UK Biobank) allow for robust numbers of cases and controls. The analytical method chosen, Mendelian Randomization, is appropriate to the problem. Another strength is the integration of multi-omic datasets, here using protein data as well as GWAS sources to integrate genomic and proteomic data.

      Thank you for your positive feedback regarding the overall quality of our work and we greatly appreciate you taking time and making effort in reviewing our manuscript.

      Weaknesses:

      The main weaknesses of the manuscript relate to the following areas:

      (1) The failure of the study to analyse the data in the context of other closely related conditions such as benign prostatic hyperplasia (BPH) or lower urinary tract symptoms (LUTS), which have some pathways and biomarkers in common, such as inflammatory pathways (including complement) and specific markers such as KLK3. As a consequence, it is not possible for readers to know whether the findings are specific to prostate cancer or whether they are generic to prostate dysfunction. Given the prevalence of prostate dysfunction (half of men reaching their sixth decade), the potential for false positives and overtreatment from non-specific biomarkers is a major problem, resulting in the evidence presented in this manuscript being weak. Other researchers have addressed this issue using the same data sources as presented here, for example, in this paper, looking at BPH in the UK Biobank population. https://www.nature.com/articles/s41467-018-06920-9

      Thank you for your valuable comment. We fully agree that biomarker development must prioritize specificity to avoid overtreatment. While our study is a foundational step toward identifying potential therapeutic targets or complementary biomarkers for prostate cancer (PCa)—not as a direct endorsement of these proteins for standalone clinical diagnosis. Mendelian randomization (MR) analysis strengthens causal inference by design, and we further ensured robustness through sensitivity analyses (e.g. MR-Egger regression for pleiotropy, Bonferroni correction for multiple testing). These methods distinguish true causal effects from nonspecific associations. Importantly, while PSA’s lack of specificity is widely recognized, its role in reducing PCa mortality underscores the value of biomarker-driven screening. Our findings align with the need to integrate multiple markers (e.g. combining a novel protein with PSA) to improve diagnostic precision. Translating these causal insights into clinical tools remains challenging but represents a necessary next step, and we emphasize that this work provides a rigorous starting point for future validation studies.

      (2) There is no discussion of Gleason scores with regard to either biomarkers or therapies, and a general lack of discussion around indolent disease as compared with more aggressive variants. These are crucial issues with regard to the triage and identification of genomically aggressive localized prostate cancers. See, for example, the work set out in: https://doi.org/10.1038/nature20788

      Thank you for pointing this out. We acknowledge that our original analysis did not directly address this critical issue due to a key data limitation: the publicly available GWAS summary statistics for PCa (from openGWAS and FinnGen) do not provide genetic associations stratified by phenotypic severity or molecular subtypes. This limitation precluded MR analysis of proteins specifically linked to aggressive disease. To partially bridge this gap, we integrate evidence from recent studies in the revised Discussion section to explore the relevance of potential biomarkers to aggressive PCa.

      (3) An additional issue is that the field of PCa research is fast-moving. The manuscript cites ~80 references, but too few of these are from recent studies, and many important and relevant papers are not included. The manuscript would be much stronger if it compared and contrasted its findings with more recent studies of PCa biomarkers and targets, especially those concerned with multi-omics and those including BPH.

      Thank you for your professional comments. We have rigorously updated the manuscript to include more recent publications and we systematically compare and contrast our findings with these recent studies in the revised Discussion section.

      (4) The Methods section provides no information on how the Controls were selected. There is no Table providing cohort data to allow the reader to know whether there were differences in age, BMI, ethnic grouping, social status or deprivation, or smoking status, between the Cases and Controls. These types of data are generally recorded in Biobank data, so this sort of analysis should be possible, or if not, the authors' inability to construct an appropriately matched set of Controls should be discussed as a Limitation.

      We thank the reviewer for raising this important methodological concern. We have expanded the Limitations section to state it.

      Reviewer #2 (Public review):

      This is potentially interesting work, but the analyses are attempted in a rather scattergun way, with little evident critical thought. The structure of the work (Results before Methods) can work in some manuscripts, but it is not ideal here. The authors discuss results before we know anything about the underlying data that the results come from. It gives the impression that the authors regard data as a resource to be exploited, without really caring where the data comes from. The methods can provide meaningful insights if correctly used, but while I don't have reasons to doubt that the analyses were conducted correctly, findings are presented with little discussion or interpretation. No follow-up analyses are performed.

      In summary, there are likely some gems here, but the whole manuscript is essentially the output from an analytic pipeline.

      We thank the reviewer for the thoughtful evaluation of our work.

      Taking the researchers aims in turn:

      (1) Meta-GWAS - while combining two datasets together can provide additional insights, the contribution of this analysis above existing GWAS is not clear. The PRACTICAL consortium has already reported the GWAS of 70% of these data. What additional value does this analysis provide? (Likely some, but it's not clear from the text.) Also, the presentation of results is unclear - authors state that only 5 gene regions contained variants at p<5x10-8, but Figure 1 shows dozens of hits above 5x10-8. Also, the red line in Figure 1 (supposedly at 5x10-8) is misplaced.

      Thank you very much for your feedback. Although the PRACTICAL consortium constituted the majority of PCa GWAS data, our meta-analysis integrating FinnGen data enhanced statistical power enabling robust detection of low-frequency variants with minor allele frequencies. Moreover, FinnGen's Finnish ancestry (genetic isolate) helps distinguish population-specific effects. The presentation of results showed the top 5 gene regions contained variants at p < 5×10<sup>-8</sup>. We apologize for not noticing that the red line was not displayed correctly in the original figures included in the manuscript. We have updated it in the revised manuscript.

      (2) Cross-phenotype analysis. It is not really clear what this analysis is, or why it is done. What is the iCPAGdb? A database? A statistical method? Why would we want to know cross-phenotype associations? What even are these? It seems that the authors have taken data from an online resource and have written a paragraph based on this existing data with little added value.

      We thank you for raising this issue. The iCPAGdb (interactive Cross-Phenotype Analysis of GWAS database) is an integrative platform that systematically identifies cross-phenotype associations and evaluates genetic pleiotropy by leveraging LD-proxy associations from the NHGRI-EBI GWAS Catalog. The pathogenesis and progression of prostate cancer constitute a complex pathophysiological continuum characterized by dynamic multisystem interactions, extending beyond singular molecular pathway dysregulation to encompass coordinated disruptions across endocrine regulation, immune microenvironment remodeling, and metabolic reprogramming. Therefore, it is indispensable for discriminating primary pathogenic drivers from secondary compensatory responses, ultimately informing the development of precision therapeutic strategies.

      (3) PW-MR. I can see the value of this work, but many details are unclear. Was this a two-sample MR using PRACTICAL + FinnGen data for the outcome? How many variants were used in key analyses? Again, the description of results is sparse and gives little added value.

      We thank you for raising this issue. Two-sample MR refers to an analytical design where genetic instruments for the exposure (plasma proteins) and genetic associations with the outcome (PCa) are derived from non-overlapping populations. This ensures complete sample independence between exposure and outcome datasets to avoid confounding biases, regardless of whether the outcome data originate from single or multiple cohorts. The meta-analysis of PRACTICAL and FinnGen GWAS generates 27,210 quality-controlled variants (p < 5×10<sup>-8</sup>, MAF ≥ 1%, LD-clumped r<sup>2</sup> < 0.1) used in key analyses.

      (4) Colocalization - seems clear to me.

      (5) Additional post-GWAS analyses (pathway + druggability) - again, the analyses seem to be performed appropriately, although little additional insight other than the reporting of output from the methods.

      The post-MR druggability and pathway analyses serve two primary scientific purposes: (1) therapeutic prioritization - systematically evaluating which MR-identified proteins represent tractable drug targets (either through existing FDA-approved agents or compounds in clinical development) with direct relevance to cancer or PCa management, and (2) mechanistic hypothesis generation - mapping these candidate proteins to coherent biological pathways to guide future functional validation studies investigating their causal roles in prostate carcinogenesis.

      Minor points:

      (6) The stated motivation for this work is "early detection". But causality isn't necessary for early detection. If the authors are interested in early detection, other analysis approaches are more appropriate.

      We appreciate your insightful feedback. While early detection is one motivation for this work, our primary goal extends to identifying causally implicated proteins that may serve as intervention targets for PCa prevention or therapy.  Establishing causality is critical for distinguishing biomarkers that drive disease pathogenesis from those that are secondary to disease progression, as the former holds greater specificity for early detection and prioritization of therapeutic targets. While we acknowledge that validation for early detection may require additional methodologies, MR analysis provides a foundational step by prioritizing candidate proteins with causal links to disease. This approach ensures that downstream efforts focus on biomarkers and targets with the greatest potential to alter disease trajectories, rather than merely correlative markers.

      (7) The authors state "193 proteins were associated with PCa risk", but they are looking at MR results - these analyses test for disease associations of genetically-predicted levels of proteins, not proteins themselves.

      In MR, the exposure of interest is the lifelong effect of genetically predicted protein levels. This approach is designed to infer causality while avoiding confounding and reverse causation, as genetic variants are fixed at conception and unaffected by disease processes. When we state “193 proteins were associated with PCa risk,” we specifically refer to proteins whose genetically predicted levels (based on instrument SNPs from protein QTLs) show causal links to PCa. Importantly, MR does not measure the direct association between observed protein concentrations and disease. Instead, it estimates the lifelong causal effect of protein levels predicted by genetics. This distinction is critical for disentangling cause from consequence. For example, a protein elevated due to tumor progression would not be identified as causal in MR if its genetic predictors are unrelated to PCa risk.

      We acknowledge that clinical translation requires further validation of these proteins in observational studies measuring actual protein levels. However, MR provides a robust first step by prioritizing candidates with causal roles, thereby reducing the risk of investing in biomarkers confounded by disease processes.

    1. Author response:

      We thank the reviewing editors, senior editors, and reviewers for their time, efforts, and constructive feedback. We believe the points raised are addressable and we would like to proceed with a revised submission for further review. Specifically, we plan the following revisions:

      Editor’s Comments

      We will clarify study definitions to ensure the meaning of "5-year crude overall survival time" is explicit for readers.

      Reviewer 1 Comments

      - Clarify and supplement the work with detailed sources of study origin (cancer registries or single-center cohorts).

      - Conduct a multi-level hierarchical meta-analysis to address concerns of ecological fallacy in interpreting results.

      - Perform an ecological sensitivity analysis and clarify findings regarding small study effects.

      - Expand the search base significantly by including African local databases; preliminary searches have identified over 50 potentially eligible doctoral theses, dissertations, local journal articles, and gray literature, potentially adding data from five or more additional countries.

      Reviewer 2 Comments

      - Conduct subgroup analyses by sex and assess the influence of the percentage of males in mixed cohorts.

      - Enhance the limited meta-analysis and provide supplementary full forest plots for all analyses.

      - Clarify phrasing in sections identified by the reviewer.

      Additional Planned Clarifications and Analyses

      - Elucidate the role of cumulative meta-analysis in mitigating lead-time bias.

      - Include supplementary cumulative meta-analysis based on the year of investigation (instead of publication year).

      - Perform subgroup analyses by clinical staging, TNM grading, and treatment modalities where data from ≥10 studies is available.

      - Expand discussion on the merits of quality assessment versus risk of bias evaluation in large scale epidemiological and observational studies, in line with other studies of this scale.

      - Condense the comparison with 2018 estimates, as per reviewer suggestions.

      Clarification Regarding SSA vs. AU Classification

      We do not intend to compare survival between "Sub-Saharan Africa" (SSA) and North Africa, as this binary classification is historically rooted and does not reflect current African Union (AU) administrative or policy groupings. Our regional analyses will adhere to the AU’s contemporary regional framework to better reflect political, cultural, and healthcare system realities.

      On Registry Data

      We will clarify that we will not extract raw registry data, as such data is typically unprocessed and does not provide 5-year overall survival metrics. As such extracting raw, individual-level data from registries or vital statistics systems falls outside the methodological scope of a meta-analysis. Meta-analyses are designed to synthesize published survival estimates or those available from reports where survival analyses have already been conducted. Utilizing raw surveillance data would require primary data processing and survival analysis — effectively creating new data, not synthesizing existing results. This would represent a distinct study design, such as a pooled analysis or original cohort study, rather than a meta-analysis. Where registry reports present summary survival estimates (e.g., 5-year overall survival) in a format compatible with meta-analysis, we will certainly include them.

      All planned additional analyses will depend on data quality, consistency, and feasibility for pooling using state-of-the-art statistical techniques. Where pooling is not possible, we will transparently report limitations.

    1. Author response:

      We thank all the reviewers for their thoughtful comments on our submitted manuscript.

      The main points made by all three reviewers were: to discuss the components of the omitted synapses and explore parameter sensitivity and broader physiological variability; to provide deeper physical insights into phase separation; to clarify terminology and provide better presentation and context in relation to previous studies.

      We fully agree with the first point, suggesting that parameter sensitivity and broader physiological variability should be explored. Our model omits scaffold proteins such as GKAP, Shank and Homer, which are present at the bottom of the PSD hierarchy. In addition, there are many other interactions in PSDs whose affinity is altered by phosphorylation, and the phase separation state of the condensate is likely to be affected by ionic concentration and other environmental factors. We will include a more detailed discussion of these environmental factors and a limitation of our study in the Discussion section. Furthermore, regarding to the sensitivity of the parameters, the reviewer's point that the membrane potential parameter is an important value is right since it directly regulates the difference between 3D and 2D systems. We plan to verify this by changing the strength of the membrane potential, and by running simulations again to see how much it affects the morphology of condensates.

      The second point is that we should provide deeper physical insight into phase separation in different dimensions. It would not be straightforward to directly estimate the entropy of the system due to the nature of the model. However, as pointed out, the difference of phase behavior can be elucidated through various simplified theories such as the lattice model. In this context, the reduced coordination number in 2D systems compared to 3D systems, and the decreased pseudo-attractive force due to the depletion effect, can offer rationalizations. We would like to add some theoretical discussion of these aspects with equations.

      Third, we will clarify terminology and provide better explanation in relation to previous studies. In some parts in manuscripts, such as complexes containing receptors, there were some disunity in terminology and lack of annotations in figures. We will improve the wording and visualization in the text for further clarity and add relevant references, as suggested by the reviewers.

      Also, as additionally suggested, scripts for the simulation and analysis together with the initial structure obtained will be deposited to Zenodo or GitHub.

    1. Author response:

      eLife Assessment

      This work presents an important technical advancement with the release of MorphoNet 2.0, a user-friendly, standalone platform for 3D+T segmentation and analysis in biological imaging. The authors provide convincing evidence of the tool's capabilities through illustrative use cases, though broader validation against current state-of-the-art tools would strengthen its position. The software's accessibility and versatility make it a resource that will be of value for the bioimaging community, particularly in specialized subfields.

      We would like to thank the editors and reviewers for their careful and constructive evaluation of our manuscript “MorphoNet 2.0: An innovative approach for qualitative assessment and segmentation curation of large-scale 3D time-lapse imaging datasets”. We are grateful for the positive assessment of MorphoNet 2.0 as a valuable and accessible tool for the bioimaging community, and for the recognition of its technical advancements, particularly in the context of complex 3D+t segmentation tasks.

      The reviewers have highlighted several important points that we will address in the revised manuscript. These include:

      - The need for a clearer demonstration that improvements in unsupervised quality metrics correspond to actual improvements in segmentation quality. In response, we will provide comparisons with gold standard annotations where available and clarify how to interpret metric distributions.<br /> - The potential risk of circular logic when using unsupervised metrics to guide model training. We now explicitly discuss this limitation and emphasize the importance of external validation and expert input.<br /> - The value of comparing MorphoNet 2.0 to other tools such as FIJI and napari. We will include a comparative table to help readers understand MorphoNet’s positioning and complementarity.<br /> - The importance of clearer documentation and terminology. We will overhaul the help pages, standardize plugin naming, and add a glossary-style table to the manuscript.<br /> - Suggestions for future developments, such as mesh export and interoperability with napari, which we will explore for the revision.

      We appreciate the detailed feedback on both scientific and editorial aspects, including corrections to figures and text, and we will integrate all suggested revisions to improve the manuscript’s clarity and impact. We are confident that these changes will strengthen the manuscript and enhance the utility of MorphoNet 2.0 for the community.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors present a substantial improvement to their existing tool, MorphoNet, intended to facilitate assessment of 3D+t cell segmentation and tracking results, and curation of high-quality analysis for scientific discovery and data sharing. These tools are provided through a user-friendly GUI, making them accessible to biologists who are not experienced coders. Further, the authors have re-developed this tool to be a locally installed piece of software instead of a web interface, making the analysis and rendering of large 3D+t datasets more computationally efficient. The authors evidence the value of this tool with a series of use cases, in which they apply different features of the software to existing datasets and show the improvement to the segmentation and tracking achieved.

      While the computational tools packaged in this software are familiar to readers (e.g., cellpose), the novel contribution of this work is the focus on error correction. The MorphoNet 2.0 software helps users identify where their candidate segmentation and/or tracking may be incorrect. The authors then provide existing tools in a single user-friendly package, lowering the threshold of skill required for users to get maximal value from these existing tools. To help users apply these tools effectively, the authors introduce a number of unsupervised quality metrics that can be applied to a segmentation candidate to identify masks and regions where the segmentation results are noticeably different from the majority of the image.

      This work is valuable to researchers who are working with cell microscopy data that requires high-quality segmentation and tracking, particularly if their data are 3D time-lapse and thus challenging to segment and assess. The MorphoNet 2.0 tool that the authors present is intended to make the iterative process of segmentation, quality assessment, and re-processing easier and more streamlined, combining commonly used tools into a single user interface.

      We sincerely thank the reviewer for their thorough and encouraging evaluation of our work. We are grateful that they highlighted both the technical improvements of MorphoNet 2.0 and its potential impact for the broader community working with complex 3D+t microscopy datasets. We particularly appreciate the recognition of our efforts to make advanced segmentation and tracking tools accessible to non-expert users through a user-friendly and locally installable interface, and for pointing out the importance of error detection and correction in the iterative analysis workflow. The reviewer’s appreciation of the value of integrating unsupervised quality metrics to support this process is especially meaningful to us, as this was a central motivation behind the development of MorphoNet 2.0. We hope the tool will indeed facilitate more rigorous and reproducible analyses, and we are encouraged by the reviewer’s positive assessment of its utility for the community.

      One of the key contributions of the work is the unsupervised metrics that MorphoNet 2.0 offers for segmentation quality assessment. These metrics are used in the use cases to identify low-quality instances of segmentation in the provided datasets, so that they can be improved with plugins directly in MorphoNet 2.0. However, not enough consideration is given to demonstrating that optimizing these metrics leads to an improvement in segmentation quality. For example, in Use Case 1, the authors report their metrics of interest (Intensity offset, Intensity border variation, and Nuclei volume) for the uncurated silver truth, the partially curated and fully curated datasets, but this does not evidence an improvement in the results. Additional plotting of the distribution of these metrics on the Gold Truth data could help confirm that the distribution of these metrics now better matches the expected distribution.

      Similarly, in Use Case 2, visual inspection leads us to believe that the segmentation generated by the Cellpose + Deli pipeline (shown in Figure 4d) is an improvement, but a direct comparison of agreement between segmented masks and masks in the published data (where the segmentations overlap) would further evidence this.

      We agree that demonstrating the correlation between metric optimization and real segmentation improvement is essential. We will add new analysis comparing the distributions of the unsupervised metrics with the gold truth data before and after curation. Additionally, we will provide overlap scores where ground truth annotations are available, confirming the improvement. We will also explicitly discuss the limitation of relying solely on unsupervised metrics without complementary validation.

      We would appreciate the authors addressing the risk of decreasing the quality of the segmentations by applying circular logic with their tool; MorphoNet 2.0 uses unsupervised metrics to identify masks that do not fit the typical distribution. A model such as StarDist can be trained on the "good" masks to generate more masks that match the most common type. This leads to a more homogeneous segmentation quality, without consideration for whether these metrics actually optimize the segmentation

      We thank the reviewer for this important and insightful comment. It raises a crucial point regarding the risk of circular logic in our segmentation pipeline. Indeed, relying on unsupervised metrics to select “good” masks and using them to train a model like StarDist could lead to reinforcing a particular distribution of shapes or sizes, potentially filtering out biologically relevant variability. This homogenization may improve consistency with the chosen metrics, but not necessarily with the true underlying structures.

      We fully agree that this is a key limitation to be aware of. We will revise the manuscript to explicitly discuss this risk, emphasizing that while our approach may help improve segmentation quality according to specific criteria, it should be complemented with biological validation and, when possible, expert input to ensure that important but rare phenotypes are not excluded.

      In Use case 5, the authors include details that the errors were corrected by "264 MorphoNet plugin actions ... in 8 hours actions [sic]". The work would benefit from explaining whether this is 8 hours of human work, trying plugins and iteratively improving, or 8 hours of compute time to apply the selected plugins.

      We will clarify that the “8 hours” refer to human interaction time, including exploration, testing, and iterative correction using plugins.

      Reviewer #2 (Public review):

      Summary:

      This article presents Morphonet 2.0, a software designed to visualise and curate segmentations of 3D and 3D+t data. The authors demonstrate their capabilities on five published datasets, showcasing how even small segmentation errors can be automatically detected, easily assessed, and corrected by the user. This allows for more reliable ground truths, which will in turn be very much valuable for analysis and training deep learning models. Morphonet 2.0 offers intuitive 3D inspection and functionalities accessible to a non-coding audience, thereby broadening its impact.

      Strengths:

      The work proposed in this article is expected to be of great interest to the community by enabling easy visualisation and correction of complex 3D(+t) datasets. Moreover, the article is clear and well written, making MorphoNet more likely to be used. The goals are clearly defined, addressing an undeniable need in the bioimage analysis community. The authors use a diverse range of datasets, successfully demonstrating the versatility of the software.

      We would also like to highlight the great effort that was made to clearly explain which type of computer configurations are necessary to run the different datasets and how to find the appropriate documentation according to your needs. The authors clearly carefully thought about these two important problems and came up with very satisfactory solutions.

      We would like to sincerely thank the reviewer for their positive and thoughtful feedback. We are especially grateful that they acknowledged the clarity of the manuscript and the potential value of MorphoNet 2.0 for the community, particularly in facilitating the visualization and correction of complex 3D(+t) datasets. We also appreciate the reviewer’s recognition of our efforts to provide detailed guidance on hardware requirements and access to documentation—two aspects we consider crucial to ensuring the tool is both usable and widely adopted. Their comments are very encouraging and reinforce our commitment to making MorphoNet 2.0 as accessible and practical as possible for a broad range of users in the bioimage analysis community.

      Weaknesses:

      There is still one concern: the quantification of the improvement of the segmentations in the use cases and, therefore, the quantification of the potential impact of the software. While it appears hard to quantify the quality of the correction, the proposed work would be significantly improved if such metrics could be provided.

      The authors show some distributions of metrics before and after segmentations to highlight the changes. This is a great start, but there seem to be two shortcomings: first, the comparison and interpretation of the different distributions does not appear to be trivial. It is therefore difficult to judge the quality of the improvement from these. Maybe an explanation in the text of how to interpret the differences between the distributions could help. A second shortcoming is that the before/after metrics displayed are the metrics used to guide the correction, so, by design, the scores will improve, but does that accurately represent the improvement of the segmentation? It seems to be the case, but it would be nice to maybe have a better assessment of the improvement of the quality.

      We thank the reviewer for this constructive and important comment. We fully agree that assessing the true quality improvement of segmentation after correction is a central and challenging issue. While we initially focused on changes in the unsupervised quality metrics to illustrate the effect of the correction, we acknowledge that interpreting these distributions may not be straightforward, and that relying solely on the metrics used to guide the correction introduces an inherent bias in the evaluation.

      To address the first point, we will revise the manuscript to provide clearer guidance on how to interpret the changes in metric distributions before and after correction, with additional examples to make this interpretation more intuitive.

      Regarding the second point, we agree that using independent, external validation is necessary to confirm that the segmentation has genuinely improved. To this end, we will include additional assessments using complementary evaluation strategies on selected datasets where ground truth is accessible, to compare pre- and post-correction segmentations with an independent reference. These results reinforce the idea that the corrections guided by unsupervised metrics generally lead to more accurate segmentations, but we also emphasize their limitations and the need for biological validation in real-world cases.

      Reviewer #3 (Public review):

      Summary:

      A very thorough technical report of a new standalone, open-source software for microscopy image processing and analysis (MorphoNet 2.0), with a particular emphasis on automated segmentation and its curation to obtain accurate results even with very complex 3D stacks, including timelapse experiments.

      Strengths:

      The authors did a good job of explaining the advantages of MorphoNet 2.0, as compared to its previous web-based version and to other software with similar capabilities. What I particularly found more useful to actually envisage these claimed advantages is the five examples used to illustrate the power of the software (based on a combination of Python scripting and the 3D game engine Unity). These examples, from published research, are very varied in both types of information and image quality, and all have their complexities, making them inherently difficult to segment. I strongly recommend the readers to carefully watch the accompanying videos, which show (although not thoroughly) how the software is actually used in these examples.

      We sincerely thank the reviewer for their thoughtful and encouraging feedback. We are particularly pleased that the reviewer appreciated the comparative analysis of MorphoNet 2.0 with both its earlier version and existing tools, as well as the relevance of the five diverse and complex use cases we selected. Demonstrating the software’s versatility and robustness across a variety of challenging datasets was a key goal of this work, and we are glad that this aspect came through clearly. We also appreciate the reviewer’s recommendation to watch the accompanying videos, which we designed to provide a practical sense of how the tool is used in real-world scenarios. Their positive assessment is highly motivating and reinforces the value of combining scripting flexibility with an interactive 3D interface.

      Weaknesses:

      Being a technical article, the only possible comments are on how methods are presented, which is generally adequate, as mentioned above. In this regard, and in spite of the presented examples (chosen by the authors, who clearly gave them a deep thought before showing them), the only way in which the presented software will prove valuable is through its use by as many researchers as possible. This is not a weakness per se, of course, but just what is usual in this sort of report. Hence, I encourage readers to download the software and give it time to test it on their own data (which I will also do myself).

      We fully agree that the true value of MorphoNet 2.0 will be demonstrated through its practical use by a wide range of researchers working with complex 3D and 3D+t datasets. In this regard, we will improve the user documentation and provide a set of example datasets to help new users quickly familiarize themselves with the platform. We are also committed to maintaining and updating MorphoNet 2.0 based on user feedback to further support its usability and impact.

      In conclusion, I believe that this report is fundamental because it will be the major way of initially promoting the use of MorphoNet 2.0 by the objective public. The software itself holds the promise of being very impactful for the microscopists' community.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      Summary: 

      The manuscript by Nicoletti et al. presents a minimal model of habituation, a basic form of non-associative learning, addressing both from dynamical and information theory aspects of how habituation can be realized. The authors identify that negative feedback provided with a slow storage mechanism is sufficient to explain habituation.

      Strengths: 

      The authors combine the identification of the dynamical mechanism with information-theoretic measures to determine the onset of habituation and provide a description of how the system can gain maximum information about the environment.

      We thank the reviewer for highlighting the strength of our work and for their comments, which we believe have been instrumental in significantly improving our work and its scope. Below, we address all their concerns.

      Weaknesses: 

      I have several main concerns/questions about the proposed model for habituation and its plausibility. In general, habituation does not only refer to a decrease in the responsiveness upon repeated stimulation but as Thompson and Spencer discussed in Psych. Rev. 73, 16-43 (1966), there are 10 main characteristics of habituation, including (i) spontaneous recovery when the stimulus is withheld after response decrement; dependence on the frequency of stimulation such that (ii) more frequent stimulation results in more rapid and/or more pronounced response decrement and more rapid spontaneous recovery; (iii) within a stimulus modality, the less intense the stimulus, the more rapid and/or more pronounced the behavioral response decrement; (iv) the effects of repeated stimulation may continue to accumulate even after the response has reached an asymptotic level (which may or may not be zero, or no response). This effect of stimulation beyond asymptotic levels can alter subsequent behavior, for example, by delaying the onset of spontaneous recovery. 

      These are only a subset of the conditions that have been experimentally observed and therefore a mechanistic model of habituation, in my understanding, should capture the majority of these features and/or discuss the absence of such features from the proposed model. 

      We are really grateful to the reviewer for pointing out these aspects of habituation that we overlooked in the previous version of our manuscript. Indeed, our model is able to capture most of these 10 observed behaviors, specifically: 1) habituation; 2) spontaneous recovery; 3) potentiation of habituation; 4) frequency sensitivity; 5) intensity sensitivity; 6) subliminal accumulation. Here, we are following the same terminology employed in Eckert et al., Current Biology 34, 5646–5658 (2024), the paper highlighted by the reviewer. We have dedicated a section of the revised version of the manuscript to these hallmarks, substantiating the validity of our framework as a minimal model to have habituation. We remark that these are the sole hallmarks that can be discussed by considering one single external stimulus and that can be identified without ambiguity in a biochemical context. This observation is again in line with Eckert et al., Current Biology 34, 5646–5658 (2024).

      In the revised version, we employ the same strategy of the aforementioned work to determine when the system can be considered “habituated”. Indeed, we introduce a response threshold that is now discussed in the manuscript. We also included a note in the discussions stating that, since any biochemical model will eventually reach a steady state, subliminal accumulation, for example, can only be seen with the use of a threshold. The introduction of different storage mechanisms, ideally more detailed at a molecular level, can shed light on this conceptual gap. This is an interesting direction of research.

      Furthermore, the habituated response in steady-state is approximately 20% less than the initial response, which seems to be achieved already after 3-4 pulses, the subsequent change in response amplitude seems to be negligible, although the authors however state "after a large number of inputs, the system reaches a time-periodic steady-state". How do the authors justify these minimal decreases in the response amplitude? Does this come from the model parametrization and is there a parameter range where more pronounced habituation responses can be observed? 

      The reviewer is correct, but this is solely a consequence of the specific set of parameters we selected. We made this choice solely for visualization purposes in the previous version. In the revised version, in the section discussing the hallmarks of habituation, we also show other parameter choices when the response decrement is more pronounced. Moreover, we remark that the contour plot of \Delta⟨U> clearly shows that the decrement can largely exceed the 20% threshold presented in the previous version.

      In the revised version, also in light of the works highlighted by the reviewer, we decided to move the focus of the manuscript to the information-theoretic advantage of habituation. As such, we modified several parts of the main text. Also, in the region of optimal information gain, habituation is at an intermediate level. For this reason, we decided to keep the same parameter choice as the previous version in Figure 2.

      We stated that the time-periodic steady-state is reached “after a large number of stimuli” from a mathematical perspective. However, by using a habituation threshold, as done in Eckert et al., Current Biology 34, 5646–5658 (2024), we can state that the system is habituated after a few stimuli for each set of parameters. This aspect is highlighted in the revised version of the manuscript (see also the point above).

      The same is true for the information content (Figure 2f) - already at the first pulse, IU, H ~ 0.7 and only negligibly increases afterwards. In my understanding, during learning, the mutual information between the input and the internal state increases over time and the system extracts from these predictions about its responses. In the model presented by the authors, it seems the system already carries information about the environment which hardly changes with repeated stimulus presentation. The complexity of the signal is also limited, and it is very hard to clarify from the presented results, whether the proposed model can actually explain basic features of habituation, as mentioned above. 

      As for the response decrement of the readout, we can certainly choose a set of parameters for which the information gain is higher. In the revised version, we also report the information at the first stimulation and when the system is habituated to give a better idea of the range of these quantities. At any rate, as the referee correctly points out, it is difficult to give an intuitive interpretation of the information in our minimal model.

      It is also important to remark that, since the readout population and the receptor both undergo fast dynamics (with appropriate timescales as discussed in the text), we are not observing the transient gain of information associated with the first stimulus. As such, the mutual information presents a discontinuous behavior that resembles the dynamics of the readout, thereby starting at a non-zero value already at the first stimulus.

      Additionally, there have been two recent models on habituation and I strongly suggest that the authors discuss their work in relation to recent works (bioRxiv 2024.08.04.606534; arXiv:2407.18204).

      We thank the reviewer for pointing out these relevant references. In the revised version, we highlighted that we discuss the information-theoretic aspects of habituation, while the aforementioned references focus on the dynamics of this phenomenon.

      Reviewer #1 (Recommendations for the authors):

      I would also like to note here the simplification of the proposed biological model - in particular, that the receptor can be in an active/passive state, as well as proposing the Nf-kB signaling module as a possible molecular realization. Generally, a large number of cell surface receptors including RTKs of GPCRs have much more complex dynamics including autocatalytic activation that generally leads to bistability, and the Nf-kB has been demonstrated to have oscillatory even chaotic dynamics (works of Savas Tsay, Mogens Jensen and others). Considering this, the authors should at least discuss under which conditions these TNF-Alpha signaling could potentially serve as a molecular realisation for habituation. 

      We thank the reviewer for bringing this to our attention. In the previous version, we reported the TNF signaling network only to show a similar coarse-grained modular structure. However, following a suggestion of reviewer #2, we decided to change Figure 1 to include a simplified molecular scheme of chemotaxis rather than TNF signaling, to avoid any source of confusion about this issue.

      Also, a minor point: Figures 2d-e are cited before 2a-c. 

      We apologize for the oversight. The structure of the Figures and their order is now significantly different, and they are now cited in the correct order. 

      Reviewer #2 (Public review):

      In this study, the authors aim to investigate habituation, the phenomenon of increasing reduction in activity following repeated stimuli, in the context of its information-theoretic advantage. To this end, they consider a highly simplified three-species reaction network where habituation is encoded by a slow memory variable that suppresses the receptor and therefore the readout activity. Using analytical and numerical methods, they show that in their model the information gain, the difference between the mutual information between the signal and readout after and before habituation, is maximal for intermediate habituation strength. Furthermore, they demonstrate that the Pareto front corresponds to an optimization strategy that maximizes the mutual information between signal and readout in the steady state, minimizes some form of dissipation, and also exhibits similar intermediate habituation strength. Finally, they briefly compare predictions of their model to whole-brain recordings of zebrafish larvae under visual stimulation. 

      The author's simplified model might serve as a solid starting point for understanding habituation in different biological contexts as the model is simple enough to allow for some analytic understanding but at the same time exhibits all basic properties of habituation in sensory systems. Furthermore, the author's finding of maximal information gain for intermediate habituation strength via an optimization principle is, in general, interesting. However, the following points remain unclear or are weakly explained: 

      We thank the reviewer for deeming our work interesting and for considering it a solid starting point for understanding habituation in biological systems.

      (1) Is it unclear what the meaning of the finding of maximal information gain for intermediate habituation strength is for biological systems? Why is information gain as defined in the paper a relevant quantity for an organism/cell? For instance, why is a system with low mutual information after the first stimulus and intermediate mutual information after habituation better than one with consistently intermediate mutual information? Or, in other words, couldn't the system try to maximize the mutual information acquired over the whole time series, e.g., the time series mutual information between the stimulus and readout?

      This is a delicate aspect to discuss and we thank the referee for the comment. In the revised version, we report information gain, initial and final information, highlighting that both gain and final information are higher in regions where habituation is present. They have qualitatively similar behavior and highlight a clear information-theoretic advantage of this dynamical phenomenon. An important point is that, to determine the optimal Pareto front, we consider a prolonged stimulus and its associated steady-state information. Therefore, from the optimization point of view, there is no notion of “information gain” or “final information”, which are intrinsically dynamical quantities. As a result, the fact that optimal curve lies in the region of optimal information gain is a-priori not expected and hints at the potential crucial role of this feature. In the revised version, we elucidate this aspect with several additional analyses.

      We would like to add that, from a naive perspective, while the first stimulation will necessarily trigger a certain (non-zero) mutual information, multiple observations of the same stimulus have to reflect into accumulated information that consequently drives the onset of observed dynamical behaviors, such as habituation.

      (2) The model is very similar to (or a simplification of previous models) for adaptation in living systems, e.g., for adaptation in chemotaxis via activity-dependent methylation and demethylation. This should be made clearer.

      We apologize for having missed this point. Our choice has been motivated by the fact that we wanted to avoid confusion between the usual definition of (perfect) adaptation and habituation. However, we now believe that this is not the case for the revised manuscript, and we now include chemotaxis as an example in Figure 1.

      (3) It remains unclear why this optimization principle is the most relevant one. While it makes sense to maximize the mutual information between stimulus and readout, there are various choices for what kind of dissipation is minimized. Why was \delta Q_R chosen and not, for instance, \dot{\Sigma}_int or the sum of both? How would the results change in that case? And how different are the results if the mutual information is not calculated for the strong stimulation input statistics but for the background one?

      We thank the reviewer for the suggestion. We agree that a priori, there is no reason to choose \delta Q_R or a function of the internal energy flux J_int (that, in the revised version, we are using in place of \dot\Sigma_int following the suggestion of reviewer #3). The rationale was to minimize \delta Q_R since this dissipation is unavoidable and stems from the presence of the storage inhibiting the receptor through the internal pathway. Indeed, considering the existence of two different pathways implementing sensing and feedback, the presence of any input will result in a dissipation produced by the receptor. This energy consumption is reflected in \delta Q_R.

      In the revised version, we now include in the optimization principle two energy contributions (see Eq. (14) of the revised manuscript): \delta Q_R and E_int, which is the energy consumption associated with the driven storage production per unit energy. All Figures have been updated accordingly. The results remain similar, as \delta Q_R still represents the main contribution, especially at high \beta.

      Furthermore, in the revised version, we include examples of the Pareto optimization for different values of input strength. As detailed both in the main text and the Supplementary Information, changing the value of ⟨H⟩ moves the Pareto frontier in the (\beta, \sigma) space, since the signal needs to be strong enough for the system to distinguish it from the intrinsic thermal noise (controlled by beta). We also show that if the system is able to tune the inhibition strength \kappa, the Pareto frontiers at different ⟨H⟩ collapse into a single curve. This shows that, although the values of, e.g., the mutual information, depend on ⟨H⟩, the qualitative behavior of the system in this regime is effectively independent of it. We also added more details about this in the Supplementary Information.

      (4) The comparison to the experimental data is not too strong of an argument in favor of the model. Is the agreement between the model and the experimental data surprising? What other behavior in the PCA space could one have expected in the data? Shouldn't the 1st PC mostly reflect the "features", by construction, and other variability should be due to progressively reduced activity levels? 

      The agreement between data and model is not surprising - we agree on this - since the data exhibit habituation. However, we believe that the fact that our minimal model is able to capture the features of a complex neural system just by looking at the PCs, without any explicit biological details, is non-trivial. We also stress that the 1st PC only reflects the feature that captures most of the variance of the data and, as such, it is difficult to have a-priori expectations on what it should represent. In the case of the data generated from the model, most of the variance of the activity comes from the switching signal, and similar considerations can be made for the looming stimulations in the data. We updated the manuscript to clarify this point.

      Reviewer #2 (Recommendations for the authors):

      (1) The abstract makes it sound like a new finding is that habituation is due to a slow, negative feedback mechanism. But, as mentioned in the introduction, this is a well-known fact. 

      We agree with the reviewer. We have revised the abstract.

      (2) Figure 2c Why does the range of Delta Delta I_f include negative values if the corresponding region is shaded (right-tilted stripes)? 

      The negative values in the range are those attained in the shaded region with right-tilted stripes. We decided to include them in the colorbar for clarity, since Delta Delta I_f is also plotted in the region where it attains negative values.

      (3) What does the Pareto front look like if the optimization is done for input statistics given by ⟨H⟩_min? 

      In the revised version, we include examples of the Pareto optimization for different values of input strength. As detailed both in the main text and the Supplementary Information, changing the value of ⟨H⟩ moves the Pareto frontier in the (\beta, \sigma) space, since the strength of the signal is crucial for the system to discriminate input and thermal noise (see also the answers above).

      In particular, in Figure 4 we explicitly compare the results of the Pareto optimization (which is done with a static input of a given statistics) with the dynamics of the model for different values of ⟨H⟩ in two scenarios, i.e., adaptive and non-adaptive inhibition strength (see answers above for details).

      We also remark that ⟨H⟩_min represents the background signal that the system is not trying to capture, which is why we never used it for optimization.

      (4) From the main text, it is rather difficult to understand how the comparison to the experimental data was performed. How was the PCA done exactly? What are the "features" of the evoked neural response? 

      The PCA on data is performed starting from the single-neuron calcium dynamics. To perform a far comparison, we reconstruct a similar but extremely simplified dynamics using our model as explained in Methods to perform the PCA on analogous simulated data. We added a comment on this in the revised version. While these components capture most of the variance in the data, their specific interpretation is usually out of reach and we believe that it lies beyond the scope of this theoretical work. We also remark that the model does not contain all these biological details - a strong aspect in our opinion - and, as such, it cannot capture specific biological features.

      Reviewer #3 (Public review):

      The authors use a generic model framework to study the emergence of habituation and its functional role from information-theoretic and energetic perspectives. Their model features a receptor, readout molecules, and a storage unit, and as such, can be applied to a wide range of biological systems. Through theoretical studies, the authors find that habituation (reduction in average activity) upon exposure to repeated stimuli should occur at intermediate degrees to achieve maximal information gain. Parameter regimes that enable these properties also result in low dissipation, suggesting that intermediate habituation is advantageous both energetically and for the purpose of retaining information about the environment. 

      A major strength of the work is the generality of the studied model. The presence of three units (receptor, readout, storage) operating at different time scales and executing negative feedback can be found in many domains of biology, with representative examples well discussed by the authors (e.g. Figure 1b). A key takeaway demonstrated by the authors that has wide relevance is that large information gain and large habituation cannot be attained simultaneously. When energetic considerations are accounted for, large information gain and intermediate habituation appear to be a favorable combination. 

      We thank the reviewer for this positive assessment of our work and its generality.

      While the generic approach of coarse-graining most biological detail is appealing and the results are of broad relevance, some aspects of the conducted studies, the problem setup, and the writing lack clarity and should be addressed: 

      (1) The abstract can be further sharpened. Specifically, the "functional role" mentioned at the end can be made more explicit, as it was done in the second-to-last paragraph of the Introduction section ("its functional advantages in terms of information gain and energy dissipation"). In addition, the abstract mentions the testing against experimental measurements of neural responses but does not specify the main takeaways. I suggest the authors briefly describe the main conclusions of their experimental study in the abstract.

      We thank the reviewer for raising this point. In the revised version, we have changed the abstract to reflect the reviewer’s points and the new structure and results of the manuscript.

      (2) Several clarifications are needed on the treatment of energy dissipation. 

      -   When substituting the rates in Eq. (1) into the definition of δQ_R above Eq. (10), "σ" does not appear on the right-hand side. Does this mean that one of the rates in the lower pathway must include σ in its definition? Please clarify.

      We apologize to the reviewer for this typo. Indeed, \sigma sets the energy scale of feedback and, as such, it appears in the energetic driving given by the feedback on the receptor, i.e., in Eq. (1) together with \kappa. This typo has been corrected in the revised manuscript, and all subsequent equations are consistent.

      -   I understand that the production of storage molecules has an associated cost σ and hence contributes to dissipation. The dependence of receptor dissipation on ⟨H⟩, however, is not fully clear. If the environment were static and the memory block was absent, the term with ⟨H⟩ would still contribute to dissipation. What would be the nature of this dissipation?

      In the spirit of building a paradigmatic minimal model with a thermodynamic meaning, we considered H to act as an external thermodynamic driving. Since this driving acts on a different pathway with respect to the one affected by the storage, the receptor is driven out of equilibrium by its presence.

      By eliminating the memory block, we would also be necessarily eliminating the presence of the pathway associated with the storage effect (“internal pathway” in the manuscript), since its presence is solely due to the existence of a storage population. Therefore, in this case, the receptor would be a 2-state, 1-pathway system and, as such, it would always satisfy an effective detailed balance. As a consequence, the definition of \delta Q_R reported in the manuscript would not hold anymore and the receptor would not exhibit any dissipation. Thus, in a static environment and without a memory block, no receptor dissipation would be present. We would also like to stress that our choice to model two different pathways has been motivated by the observation that the negative feedback acts along a different pathway in several biochemical and biological examples. We made some changes to the model description in the revised version and we hope that this aspect has been clarified.

      -   Similarly, in Eq. (9) the authors use the ratio of the rates Γ_{s → s+1} and Γ_{s+1 → s} in their expression for internal dissipation. The first-rate corresponds to the synthesis reaction of memory molecules, while the second corresponds to a degradation reaction. Since the second reaction is not the microscopic reverse of the first, what would be the physical interpretation of the log of their ratio? Since the authors already use σ as the energy cost per storage unit, why not use σ times the rate of producing S as a metric for the dissipation rate? 

      We agree with the referee that the reverse reaction we considered is not the microscopic reverse of the storage production. In the case of a fast readout population, we employed a coarse-grained view to compute this entropy production. To be more precise, we gladly welcomed the referee’s suggestion in the revised version and modified the manuscript accordingly. As suggested, we now employ the energy flux associated with the storage production to estimate the internal dissipation (see new Fig. 3). 

      In the revised version, we also use this quantity in the optimization procedure in combination with \deltaQ_R (see new Fig. 4) to have a complete characterization of the system’s energy consumption. The conclusions are qualitatively identical to before, but we believe that now they are more solid from a theoretical perspective. For this important advance in the robustness and quality of our work, we are profoundly grateful to the referee.

      (3) Impact of the pre-stimulus state. The plots in Figure 2 suggest that the environment was static before the application of repeated stimuli. Can the authors comment on the impact of the pre-stimulus state on the degree of habituation and its optimality properties? Specifically, would the conclusions stay the same if the prior environment had stochastic but aperiodic dynamics? 

      The initial stimulus is indeed stochastic with an average constant in time and mimics the background (small) signal. We apply the (strong) stimulation when the system already reached a stationary state with respect to the background. As it can be appreciated in Fig. 2 of the revised version, the model response depends on the pre-stimulus level, since it sets the storage concentration before the stimulation arrives and, as such, the subsequent habituation dynamics. This dependence is important from a dynamical perspective. The information-theoretic picture has been developed, as said above, by letting the system relax before the first stimulus. This eliminates this arbitrary dependence and provides a clearer idea of the functional advantages of habituation. Moreover, the optimization procedure is performed in a completely different setting, with no pre-stimulus at all, since we only have one prolonged stimulation. We hope that the revised version is clearer on all these points.

      (4) Clarification about the memory requirement for habituation. Figure 4 and the associated section argue for the essential role that the storage mechanism plays in habituation. Indeed, Figure 4a shows that the degree of habituation decreases with decreasing memory. The graph also shows that in the limit of vanishingly small Δ⟨S⟩, the system can still exhibit a finite degree of habituation. Can the authors explain this limiting behavior; specifically, why does habituation not vanish in the limit Δ⟨S⟩ -> 0?

      We apologize for the lack of clarity and we thank the reviewer for spotting this issue. In Figure 4 (now Figure 5 in the revised manuscript) Δ⟨S⟩ is not exactly zero, but equal to 0.15% at the final point. It appeared as 0% in the plot due to an unwanted rounding in the plotting function that we missed. This has been fixed in the revised version, thank you.

      Reviewer #3 (Recommendations for the authors):

      (1) Page 2 | "Figure 1b-e" should be "Figure 1b-d" since there is no panel (e) in Figure 1. 

      (2) Figure 1a | In the top schematic, the symbol "k" is used, while in the rest of the text, the proportionality constant is denoted by κ. 

      We thank the reviewer for pointing this out. Figure 1 has been revised and the panels are now consistent. The proportionality constant (the inhibition strength) has also been fixed.

      (3) Figure 1a | I find the upper part of the schematic for Storage hard to perceive. I understand the lower part stands for the degradation reaction for storage molecules. The upper part stands for the synthesis reaction catalyzed by the readout population. I think the bolded upper arrow would explain it sufficiently well; the left/right arrows, together with the crossed green circle make that part of the figure confusing. Consider simplifying. 

      We decided to remove the left/right arrows, as suggested by the reviewer, as we agree that they were unnecessarily complicating the schematic. We hope that the revised version will be easier to understand.

      (4)Page 3 | It would be helpful to tell what the temporal statistics of the input signal $p_H(h,t)$ is, i.e. <h(t) h(t')>. Looking at the example trajectory in Figure 1a, consecutive signal values do not seem correlated. 

      We agree with the reviewer that this is an important detail and worth mentioning. We now explicitly state that consecutive values are not correlated, for simplicity. 

      (5)Figure 2 | I believe the label "EXTERNAL INPUT" refers to the *average* external input, not one specific realization (similar to panels (d) and (e) that report on average metrics). I suggest you indicate this in the label, or, what may be even better, add one particular realization of the stochastic input to the same graph.

      We thank the reviewer for spotting this. We now write that what we show is the average external signal. We prefer this solution rather than showing a realization of the stochastic input, since it is more consistent with the rest of the plots, where we always show average quantities. We also note that Figure 2 is now Figure 3 in the revised manuscript.

      (6)Figure 2d | The expression of Δ⟨U⟩ is the negative of the definition in Eq. (5). It should be corrected. 

      In the revised version, both the definitions in Figure 2 (now Figure 3) and in the text (now Eq. (11)) are consistent.

      (7) Figure 3(d-e) caption | "where ⟨U⟩ starts to be significantly smaller than zero." There, it should be Δ⟨U⟩ instead of ⟨U⟩. 

      Thanks again, we corrected this typo.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      (1) Wnt3 cue and global PCP. PCP has been described in detail in a previous paper on Clytia (Momose et al, 2012): its orientation along the oral-aboral body axis (ciliary basal body positioning studies), and its function in directional polarity during gastrulation (Stbm-, Fz1-, and Dsh-MO experiments). I wonder if this part could be shortened. What is new, however, are the knockdown and Wnt3-mRNA rescue experiments, which provide a deeper insight into the link between Wnt3 function in the blastopore organiser as a source or cue for axis formation. These experiments demonstrate that the Wnt3 knockdown induces defects equivalent to PCP factor knockdown, but can be rescued by Wnt3-mRNA injection, even at a distance of 200 µm away from the Wnt-positive area. The experimental set-up of these new molecular experiments follows in important aspects those of Freeman's experiments of 1981 (who in turn was motivated to re-examine Teissier's work of 1931/1933 ...). Freeman did not use the term "global polarity" but the concept of an axis-inducing source and a long-range tissue polarity can be traced back to both researchers.

      We appreciate the reviewer’s insightful comments for evolutionary biology and cnidarian developmental biology.

      Concerning the presentation of the basic PCP structure of Clytia embryo epidermal cells, we prefer to retain this section unless there is a strict limit on manuscript length. These experiments provide background information necessary to establish the biological system for the readers. The structures of cells, notably cell adhesion, cilia, and the cytoskeleton, are essential components of this system.

      We have restored sentences concerning the historical contributions of Freeman and Teissier from a previous version of the manuscript.

      Freeman’s work offered two key insights. The first is the concept that cell polarity spreads and self-organizes over the distances revealed by the tissue orientation of aggregate embryonic cells (Freeman, 1981 https://doi.org/10.1007/BF00867804), which was termed “global polarity” in a review by Primus and Freeman (2004 https://doi.org/10.1002/bies.20031). This concept closely resembles the modern understanding of PCP coordination mechanisms mediated by core PCP interactions. Remarkably, Freeman proposed this idea in the early 1980s, at the same time of the first characterization of PCP mutants in Drosophila (Gubb and Garcia-Bellido 1982). The second is the role of egg polarity in defining the axis. Freeman demonstrated that the position of the first cleavage furrow predicts the oral-aboral axis by a series of sophisticated experiments. This was a milestone for the studies of cnidarian body axis development.

      However, some of Freeman’s interpretations were misleading. In the 1981 paper, he stated:

      "Polarity

      Other work that I have done has established that the anterior-posterior axis of the planula is set up at the time of the first cleavage; the site where cleavage is initiated specifies the posterior pole of this axis (Freeman 1980). The experiment reported here in which embryos were cut into halves and each half regulated to form a normal planula with the same polarity properties as the embryo it is from provides evidence that these polarity properties are remarkably stable at all developmental stages tested ranging from 4 cell to postgastrula embryos. "

      Freeman hypothesised that cell polarity at the 2- or 4-cell stage, referred to as the “polarity of first cell cleavage,” is directly inherited as the global polarity observed in later developmental stages.

      In the review by Primus and Freeman (2004), two hypotheses were introduced: (1) maternally localised factors, such as mRNA, determine the axis, and (2) cell polarity of cleavage furrow formation, is inherited to later stages and determines the axis. Freeman described these two hypotheses as mutually exclusive. However, we now know that cell polarity at early cleavage stages does not directly contribute to global polarity/PCP. Instead, Wnt/β-catenin signaling is regionally activated by maternally localised mRNAs distributed along the egg polarity (Momose, 2007; Momose, 2008), which maintain Wnt3 localisation and direct morphological axis patterning. Our study shown in this article unified these hypotheses.

      On the second point, as the reviewer noted, Freeman indeed revisited the work of Georges Teissier (Teissier, 1931), who conducted similar experiments on Amphisbetia embryos. It was Teissier who first described how the egg polarity is preserved in later stages and defines the axis. Teissier, however, carefully avoided asserting continuity between egg and blastula polarities, allowing for the possibility of “rétablissement” (re-establishment). As Teissier stated:

      "…On constate, en second lieu, que la polarité de l’œuf se conserve dans chacun de se fragment et que le maintien ou le rétablissement de cette polarité sont indispensables à un développement normal. Un fragment d’œuf ou de morula n’a aucune partie ni aucun blastomère qui soit rigoureusement déterminé comme endoderme, mais possède, par contre, un pôle antérieur et un pôle postérieur bien définis.…

      Mais cette proposition, qui ne semble pourtant guère dépasser la simple constatation des faits, soulève de grave difficulté. Elle donne en effet à la polarité, propriété encore bien mystérieuse, un rôle morphogénétique de premier ordre et implique des conséquences trop importantes pour qu’on puisse l’accepter sans un très sérieux examen.

      Comme je ne pense pas que les questions relatives à la nature des localisation germinales, à l’existence et au fonctionnement des organisateurs de l’œuf des Cœlentérés, puissant, dans l’état actuel de nos connaissances, être discutées utilement, je ne veux voir dans la proposition précédente qu’une façons commode et tout provisoire de systématiser les faits."

      English translation:

      “We note also that the polarity of the egg is preserved in each fragment and that the maintenance or re-establishment of this polarity is essential for normal development. A fragment of egg or morula has no part or blastomere that is rigorously determined as endoderm, but has, on the other hand, a well-defined anterior and posterior pole....

      But this proposition, which hardly seems to go beyond the simple observation of facts, raises serious difficulties. It gives polarity, still a mysterious property, a morphogenetic role of the first order, and implies consequences too important to be accepted without very serious examination.

      As I do not believe that questions concerning the nature of germinal localisation, or the existence and functioning of the egg organisers in Coelenterates, can, in the present state of our knowledge, be usefully discussed, I prefer only to see in the foregoing proposition a convenient and very provisional way of systematising the facts.”

      Teissier, G. (1931). Étude Expérimentale du Développement de Quelques Hydraires. Ann. Sc. Nat. Zool 14, 5–59.

      Teissier's interpretation and caution were reasonable.

      Our work connects recent molecular research on axis specification mechanisms in cnidarians with the classic experimental studies of Freeman and Teissier. We believe it is essential to present and acknowledge their conceptual contributions.  We have updated the Discussion to include these points.

      (2) PCP propagation and β-catenin. The central but unanswered question in this study focuses on the interaction between Wnt3 and PCP and the propagation of PCP. Wnt3 has been described in cnidarians but also in vertebrates and insects as a canonical Wnt interacting with β-catenin in an autocatalytic loop. The surprising result of this study is that the action of Wnt3 on PCP orientation is not inhibited in the presence of a dominant-negative form of CheTCF (dnTCF) ruling out a potential function of β-catenin in PCP. This was supported by studies with constitutively active β-catenin (CA-β-cat) mRNA which was unable to restore PCP coordination nor elongation of Wnt3-depleted embryos but did restore β-catenin-dependent gastrulation. Based on these data, the authors conclude that Wnt3 has two independent roles: Wnt/β-catenin activation and initial PCP orientation (two-step model for PCP formation). However, the molecular basis for the interaction of Wnt3 with the PCP machinery and how the specificity of Wnt3 for both pathways is regulated at the level of Wnt-receiving cells (Fz-Dsh) remain unresolved. Also, with respect to PCP propagation, there is no answer with respect to the underlying mechanisms. The authors found that PCP components are expressed in the mid-blastula stage, but without any further indication of how the signal might be propagated, e.g., by a wavefront of local cell alignment. Here, it is necessary to address the underlying possible cellular interactions more explicitly.

      The question of how Wnt3 interacts with the core PCP complex remains open for future investigation. An obvious hypothesis is that one of the Frizzled receptors binds Wnt3 ligands. For additional details, please refer to the response to Reviewer 2’s comment. Regarding other non-classic Wnt receptors, studies in the developing mouse limb have demonstrated that a Wnt5a gradient controls PCP polarisation via ROR receptors and graded Strabismus phosphorylation (Gao et al., 2011, https://doi.org/10.1016/j.devcel.2011.01.001). However, in this context, the Wnt5a gradient influences the frequency of polarised cells rather than PCP orientation. In Clytia, we performed gene knockdown experiments targeting ROR and RYK receptors using Morpholinos but did not observe any effect on axial patterning, suggesting that these receptors are unlikely to be involved in Wnt3 interaction.

      Concerning PCP propagation mechanisms, these are well-characterized in both Drosophila and vertebrates and conserved across taxa. The localised Fz-Fmi complex at the apical cortex of a cell interacts with the oppositely localised Stbm-Fmi complex in neighbouring cells, enabling coordination of PCP between directly adjacent cells. This interaction provides a comprehensive explanation for PCP propagation mechanisms. In Drosophila, the “domineering non-autonomy” effect is a well-documented phenomenon where PCP orientation autonomously propagates from core PCP mutant mosaic patches. Overall, PCP propagation is a conserved and robust mechanism across metazoans.

      (3) The proposed two-step model for PCP formation has important evolutionary implications in that it excludes the current alternate model according to which a long-range Wnt3-gradient orients PCP ("Wnt/β-catenin-first"). Nevertheless, the initial PCP orientation by Wnt3 - as proposed in the two-step model - is not explained at all on the molecular level. Another possible, but less well-discussed and studied option for linking Wnt3 with PCP action could be the role of other Wnt pathways. The authors present compelling evidence that Wnt3 is the most highly expressed Wnt in Clytia at all stages of development. The authors convincingly show that Wnt3 is the most highly expressed Wnt in Clytia at all stages of development (Figure S1). However, Wnt7 is also more highly expressed, which makes it a candidate for signal transduction from canonical Wnts to PCP Wnts. An involvement of Wnt7 in PCP regulation has been described in vertebrates (http://dx.doi.org/10.1016/j.celrep.2013.12.026). This would challenge the entire discussion and speculation on the evolutionary implications according to which PCP Wnt signaling comes first (PCP-first scenario") and canonical Wnt signaling later in metazoan evolution.

      First of all, we apologise that the expression profile of Wnt7originally provided in Figure S1 was incorrect; Wnt7 is not expressed in the embryonic stage. The error came from the accession number XLOC_034538 assigned to two transcripts, Wnt7 and Ataxin10, in the published genome assembly. Once the expression profile is revised in this light, the data are consistent with the in situ hybridisation data published in Momose et al. (2012, https://doi.org/10.1242/dev.084251). Wnt3 is the only Wnt ligand detectable between egg and gastrula stages. We appreciate the reviewer highlighting this issue and have corrected Figure S1

      If we understand correctly, the reviewer raises the possibility that Wnt3's downstream canonical Wnt/β-catenin pathway activates the expression of other Wnt genes, which in turn orient the PCP. Indeed, we showed that the expression of Wnt1 (previously called WntX2), Wnt2 (WntX1A), Wnt5 and Wnt6 (Wnt9) all becomes undetectable at the planula stage following Wnt3-MO injection (Momose et al., 2012). So, it is a reasonable concern.

      This possibility can be excluded because the canonical pathway activation by CA-β-cat does not restore PCP in Wnt3-MO-injected embryos and Wnt3 can orient PCP without Wnt/β-catenin pathway activity in the presence of dominant negative TCF (dnTCF). Concerning Wnt1b and Wnt11b, these transcripts are maternally stored and even more abundant than Wnt3. However, we can conclude that these do not have any role in axis patterning based on the complete axis loss in Wnt3-MO morphants.

      Lastly, it should of course be remembered that the chronological order of characters appearing in a developmental process does not necessarily reflect their appearance in evolution from ancestral to modern.

      (4) The discussion, including Figure 6, is strongly biased towards the traditional evolutionary scenario postulating a choanzoan-sponge ancestry of metazoans. Chromosome-linkage data of pre-metazoans and metazoans (Schulz et al., 2023; https://doi.org/10(1038/s41586-023-05936-6) now indicate a radically different scenario according to which ctenophores represent the ancestral form and are sister to sponges, cnidarians and bilaterians (the Ctenophora-sister hypothesis). This has also implications for the evolution of Wnt signalling, as discussed in the recent Nature Genetics Review by Holzem et al. (2024) (https://doi.org/10.1038/s41576-024-00699-w). Furthermore, it calls into question the hypothesis of a filter-feeding multicellular gastrula-like ancestor as proposed by Haeckel (Maegele et al., 2023). These papers have not yet been referenced, but they would provide a more robust discussion.

      I overlooked the excellent work of Holzem and colleagues. I appreciate this suggestion. The work, unfortunately, focusses mainly on the Wnt/β-catenin pathway. The PCP pathway consists of not only core PCP (Fmi Stbm, Pk, Dgo, Fz and Dsh) but many other components, such as Rho GTPase, which are all dealt with as "PCP” in this review. While the full set of core PCP is present only in the phylum Cnidaria and bilaterians, Pk and Dgo are present in choanoflagellate and Rho GTPase or ROCK are present even in Fungi (Lapébie et al,  2011 DOI 10.1002/bies.201100023). Holzem et al., described PCP as absent in ctenophores, likely based on the lack of Fmi/Stbm, while claiming its presence in fungi based on Rho GTPase and ROCK. This led to their argument that the Wnt/β-catenin pathway is more ancestral, supported by the absence of PCP components in ctenophores alongside the ctenophore-sister hypothesis.

      This likely reflects the limited attention given to PCP in the metazoan evolutionary biology community. Our work sheds light on the importance of PCP regulation in metazoan evolution. In the revised Discussion, we emphasise this point together with the importance of cell biology studies in basal metazoans and compare them based on functional studies.

      The observation of Aiptasia’s predatory “gastrula-like” larvae is indeed fascinating. Understanding how early metazoan ancestors obtained nutrients is a key to uncovering the origins of metazoans. However, the relevance of this work to metazoan evolution remains unclear. Predatory nutrient uptake is common among cnidarians, and the findings of Maegele et al. could suggest that the predatory gastrula-like state is ancestral, with the symbiotic state being derived, within Cnidaria, but does not notably support it in metazoa. Also, it has to be clarified how predation is defined. Fundamentally, there is little distinction between filter-feeding and predatory feeding regarding heterotrophy; both feeding types require digestive machinery. If active feeding behaviour is the essence of predation, this would be better addressed as an evolutionary neurobiology or neuroscience question. Another mystery is what the metazoan ancestors took as food if they were predatory; there has to be a non-predatorial metazoan, as a food, already present before them.<br /> Overall, Maegele’s work seems premature to be incorporated into the metazoan evo-devo discussion. In either case, the standard approach would involve comparative studies across taxa. It will be interesting to see follow-up works on comparative and functional genomics of predatory/digestive machinery within phylum cnidaria and across metazoan, including sponge and ctenophores.

      Reviewer #2 (Recommendations for the authors):

      We appreciate the reviewer’s expertise and recommendations regarding Wnt and PCP signalling. It would be our great pleasure if our work is seen and referenced by the cell biology community using model animals.

      (1) According to the 2-step model, one would expect that there is a temporal gradient in the spreading of the PCP from oral to aboral. Is there any indication for this?

      The best indication of a spatial and temporal gradient of PCP establishment observed so far is at the blastula stage (Fig.2B). PCP gradually becomes coordinated starting at 9 hpf, when PCP is slightly better organised close to the Wnt3-positive area (oral) compared to distal (aboral) areas. We did live imaging with tagged Poc1 to track the positions of centrioles in each cell (Fig. 2E), but this did not provide any further information about the spreading of the PCP. We hypothesise that there is a delay between PCP polarisation—established through the subcellular localisation of core PCP components—and its structural manifestation as ciliary positioning and orientation. This delay likely varies between cells, preventing the formation of a precise spatial PCP wave. We hope in the future to address this temporal aspect by live-imaging of core PCP proteins labelled with fluorescent proteins.

      (2) PCP is likely to be an all-or-nothing effect, while axial patterning is dose-dependent. is there a critical dose of Wnt3 level required to kick off the PCP pathway?

      We agree that the PCP phenotype is all-or-nothing.  Although we did not perform a quantitative test, we have not seen any intermediate phenotypes in Wnt3-rescue experiments. In our experimental condition (100 ng/µl mRNA), the Wnt3 mRNA injection into a blastomere consistently restores the body axis (via PCP) of Wnt3-MO injected embryos. No axis restoration was observed at 1 ng/µl. At 10 ng/µl, some embryos showed a restored elongated axis, while others showed no axis. The volume of injection is not precisely controllable and can easily vary two-fold, so we assume the limit is somewhere around 10 ng/µl. This contrasts with endoderm rescue via Wnt/β-catenin activation by GSK-β-inhibitors (alsterpaullone) or the constitutively active β-catenin (CA-β-cat), which occurs in a dose-dependent manner (ex. Supplementary Figure S2).

      (3) The key question left unaddressed is whether Wnt3 signals through one or two different Frizzled receptors? Which Frizzled receptors are candidates for this? Could they be knocked down to see which pathway (or both) is affected?

      How Wnt3 orientates the PCP system is an extremely interesting question that needs to be answered, and we plan to address this in the future. In Clytia, four Frizzled genes have been identified in the genome: CheFz1 (vertebrate counterpart of Fz1, 2, 3, 6 and 7), CheFz2 (Fz5 and 8), CheFz3 (Fz9/10) and CheFz4 (Fz4). Knockdown of CheFz1, hereby called Fz1, by Morpholino showed a PCP phenotype (Momose 2012, supplementary data). For a long time, we have suspected that the most likely candidate for PCP mediation is CheFz1. The Wnt3-rescue experiment in CheFz1-blocked background (similar experiment to Figure 3E, F) could potentially have answered this question. No PCP orientation would be expected even near the Wnt3-mRNA injected area if CheFz1 was the Wnt3 receptor for PCP orientation. Unfortunately, no reliable PCP phenotype was observed in this experiment, so this experiment was not included in the manuscript. We initially thought this was due to incomplete suppression of CheFz1 mRNA translation by the Morpholino when used at sub-toxic doses. But we now favour the alternative explanation that Fz1 does not mediate the Wnt3 signal responsible for initiating PCP orientation. We have previously shown that Fz1 is required for the Wnt/ β-catenin pathway (indicated by nuclear β-catenin localisation Momose 2007), which is then required to maintain Wnt3 expression. We cannot rule out that the PCP phenotype obtained previously following Fz1 knockdown (supplementary data in Momose 2012) is an indirect effect of Wnt3 downregulation.

      In future work, we plan to test the PCP involvement of the other Clytia Frizzleds, notably CheFz2 and CheFz4, which are not present as maternal mRNAs but are zygotically expressed in the early gastrula stage. CheFz3 is unlikely to be a candidate because it is aborally localised and acts as a negative receptor for the Wnt/β-catenin pathway (Momose 2007). Lastly, in unpublished experiments, no axial phenotype was obtained with ROR and RYK knockdown by Morpholino (T. Momose unpublished). 

      Based on these considerations, our current working hypothesis is that Wnt3 somehow stabilises or activates one of the Frizzled receptors acting as a core PCP protein in a polarised manner, likely at the oral side of each cell (Stbm is localised at the aboral side), which breaks the PCP symmetry and is propagated across the body axis.

      A few lines have been added to the discussion regarding this point.

      (4) Is there also PCP within the Wnt3 expressing domain? In other words, (and linked to question 2), does PCP require a certain concentration of Wnt3 or a gradient of Wnt3 in order to provide an orientation?

      In the context of a simple Wn3-MO rescue experiment, PCP is coordinated within the Wnt3-positive area. But this could be because PCP can propagate in both orientations, so it does not answer the question. In the Wnt3-rescue experiments in Fmi-MO and Stbm-MO embryos, PCP seemed better oriented close to the boundary between Wnt3-positive and -negative areas, in particular outside the Wnt3-positive area and rather uncoordinated deep in the middle of Wnt3-RNA positive area. 

      If Wnt3 expression is uniform across an embryo, as achieved by Wnt3-mRNA injection into the egg, the axis will be lost entirely (Momose 2008). We interpret these observations as indicating that Wnt3 expression "contrasts" (or steep gradients) act as the PCP orientation cue rather than a permissive manner.

      In normal development, mRNA expression detected by in situ hybridisation has a slight gradient, but we do not have any information about the endogenous protein distribution.

      We greatly appreciate the reviewer’s insightful comments. A few sentences addressing points (2) and (4) have been added. The graphical models in Figures 4 and 6A have been updated. While these are relatively minor changes to the manuscript, they significantly impact future perspectives.

      Minor comments:

      (1) Labeling in some of the figures is too small and not legible, e.g. Figures 4E-H. Please check and improve.

      Agreed. Some labelling was way too small (2.5 points). This has been corrected. The minimum font size is now 6-point for most labelling in the revised Figures. 

      (2) Page 13: ...and allow us to novel scenarios for PCP-driven axis symmetry breaking... seems to lack the verb "propose"

      Corrected.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      Compelling and clearly described work that combines two elegant cell fate reporter strains with mathematical modelling to describe the kinetics of CD4+ TRM in mice. The aim is to investigate the cell dynamics underlying the maintenance of CD4+TRM.

      The main conclusions are that:

      (1) CD4+ TRM are not intrinsically long-lived.

      (2) Even clonal half-lives are short: 1 month for TRM in skin, and even shorter (12 days) for TRM in lamina propria.

      (3) TRM are maintained by self-renewal and circulating precursors.

      Strengths:

      (1) Very clearly and succinctly written. Though in some places too succinctly! See suggestions below for areas I think could benefit from more detail.

      (2) Powerful combination of mouse strains and modelling to address questions that are hard to answer with other approaches.

      (3) The modelling of different modes of recruitment (quiescent, neutral, division linked) is extremely interesting and often neglected (for simpler neutral recruitment).

      Weaknesses/scope for improvement:

      (1) The authors use the same data set that they later fit for generating their priors. This double use of the same dataset always makes me a bit squeamish as I worry it could lead to an underestimate of errors on the parameters. Could the authors show plots of their priors and posteriors to check that the priors are not overly-influential? Also, how do differences in priors ultimately influence the degree of support a model gets (if at all)? Could differences in priors lead to one model gaining more support than another?

      We now show the priors and posteriors overlaid in Figure S2. The posteriors lie well within the priors, giving us confidence that the priors are not overly influential.

      (2) The authors state (line 81) that cells were "identified as tissue-localised by virtue of their protection from short-term in vivo labelling (Methods; Fig. S1B)". I would like to see more information on this. How short is short term? How long after labelling do cells need to remain unlabelled in order to be designated tissue-localised (presumably label will get to tissue pretty quickly -within hours?). Can the authors provide citations to defend the assumption that all label-negative cells are tissue-localised (no false negatives)?

      And conversely that no label-positive cells can be found in the tissue (no false positives)? I couldn't actually find the relevant section in the methods and Figure S1B didn't contain this information.

      We did describe the in vivo labeling in the first section of Methods (it was for 3 mins before sacrifice). The two aims of Fig S1B were to show the gating strategy (label-positive and negatives from tissue samples were clearly separated) and to address the false-positive issue. Less than 3% of cells in our tissue samples were positive; therefore, at most 3% of truly tissue-resident cells acquired the i.v. label, and likely less. Excluding those (as we did) therefore makes little difference to our analyses in terms of cell numbers. False negative rates are expected to be extremely low; labeling within circulating cells is typically >99% (see refs in Methods).

      (3) Are the target and precursor populations from the same mice? If so is there any way to reflect the between-individual variation in the precursor population (not captured by the simple empirical fit)? I am thinking particularly of the skin and LP CD4+CD69- populations where the fraction of cells that are mTOM+ (and to a lesser extent YFP+) spans virtually the whole range. Would it be nice to capture this information in downstream predictions if possible?

      This is a great point. We do indeed isolate all populations from each mouse. We are very aware of the advantages of using this grouping of information to reduce within-mouse uncertainty – we employ this as often as we can. The issue here was that the label content within the tissue (target) at any time depends on the entire trajectory of the label frequency in the precursor, in that mouse, up to that point. We can’t identify this curve for each animal individually – so we are obliged to use a population average.

      To mitigate this lack of pairing we do take a very conservative approach and fit this empirical function describing the trajectories of YFP and mTom in precursors at the same time as the label kinetics in the target; that is, we account for uncertainty in label influx in our fits and parameter estimates.

      Another issue is that to be sure that we are performing model selection appropriately, we only use the distribution of the likelihood on the target observations when comparing support for different precursors with LOO-IC. If we had been able to pair the precursor and target data in some way, the two would then be entangled and model comparison across precursors would not be possible.

      We’ve added some of this to the discussion.

      (4) In Figure 3, estimates of kinetics for cells in LP appear to be more dependent on the input model (quiescent/neutral/division-linked) than the same parameters in the skin. Can the authors explain intuitively why this is the case?

      This is a nice observation and it has a fairly straightforward explanation. As we pointed out in the paper, estimated rates of self renewal become more sensitive to the mode of recruitment the greater the rate of influx. If immigrants are quiescent, all Ki67 in the tissue has to be explained by self renewal. If all new immigrants are Ki67 high, the estimate of the rate of self renewal within the tissue will be lower. Across the board, the estimated rates of influx into gut were consistently higher than those in skin, and so the sensitivity of parameters to the mode of recruitment was much more obvious at that site.

      The importance of this trade-off for the division linked model can also be seen when you look at the neutral and quiescent models; they give similar parameter estimates because the Ki67 levels within all precursor populations were all less than 25% and so those two modes of recruitment are difficult to distinguish.

      (5) Can the authors include plots of the model fits to data associated with the different strengths of support shown in Figure 4? That is, I would like to know what a difference in the strength of say 0.43 compared with 0.3 looks like in "real terms". I feel strongly that this is important. Are all the fits fantastic, and some marginally better than others? Are they all dreadful and some are just less dreadful? Or are there meaningful differences?

      This is another good point (and from the author recommendations list, is your most important concern).

      We find that a fairly common issue is that models that are clearly distinguished by information criteria or LRTs can often give visually quite similar fits. Our experience is that this is partly due to the fact that models are usually fit on transformed scales (e.g. log for cell counts, logit for fractions) to normalise residuals, and this uncertainty is compressed when one looks at fits on the observed scale (e.g. linear). Another issue in our case is that for each model (precursor, target, and mode of recruitment) we fit 6 time courses simultaneously. Visual comparisons of fits of different models can then be a little difficult or misleading; apparently small differences in each fitted timecourse can add up to quite significant changes in the combined likelihood. We added this to the Discussion.

      The number of models is combinatorial (Fig. 4) so showing them all seems a bit cumbersome. But now in the supporting information (Fig. S3), for each target we show the best, second best, and the worst model fits overlaid, to give a sense of the dynamic range of the models we considered. As you will now see, visual differences among the most strongly supported models were not huge (but refer to our point just above). Measures of out-of-sample prediction error (LOO-IC) discriminated between these models reasonably well, though (weights shown in Fig. 4).

      It’s also worth mentioning here that we have substantially greater confidence in the identity of the precursors than in the precise modes of recruitment - you can see this clearly in the groupings of weights in Figure 4A. We did comment on this in the text but now emphasise it more.

      (6) Figure 4 left me unclear about exactly which combinations of precursors and targets were considered. Figure 3 implies there are 5 precursors but in Figure 4A at most 4 are considered. Also, Figure 4B suggests skin CD69- were considered a target. This doesn't seem to be specified anywhere.

      Thanks for pointing this out. When we were considering CD4+ EM in bulk as target, this population includes CD69- cells; in those fits, therefore, we couldn't use CD69- as a precursor. We now clarify this in the caption. Thanks also for the observation about Figure 4B; we didn’t consider CD69- cells as a target, so we’ve also made that clearer.

      Reviewer #2 (Public review):

      This manuscript addresses a fundamental problem of immunology - the persistence mechanisms of tissue-resident memory T cells (TRMs). It introduces a novel quantitative methodology, combining the in vivo tracing of T-cell cohorts with rigorous mathematical modeling and inference. Interestingly, the authors show that immigration plays a key role in maintaining CD4+ TRM populations in both skin and lamina propria (LP), with LP TRMs being more dependent on immigration than skin TRMs. This is an original and potentially impactful manuscript. However, several aspects were not clear and would benefit from being explained better or worked out in more detail.

      (1) The key observations are as follows:

      a) When heritably labeling cells due to CD4 expression, CD4+ TRM labeling frequency declines with time. This implies that CD4+ TRMs are ultimately replenished from a source not labeled, hence not expressing CD4. Most likely, this would be DN thymocytes.

      That’s correct.

      b) After labeling by Ki67 expression, labeled CD4+ TRMs also decline - This is what Figure 1B suggests. Hence they would be replaced by a source that was not in the cell cycle at the time of labeling. However, is this really borne out by the experimental data (Figure 2C, middle row)? Please clarify.

      (2) For potential source populations (Figure 2D): Please discuss these data critically. For example, CD4+ CD69- cells in skin and LP start with a much lower initial labeling frequency than the respective TRM populations. Could the former then be precursors of the latter?

      A similar question applies to LN YFP+ cells. Moreover, is the increase in YFP labeling in naïve T cells a result of their production from proliferative thymocytes? How well does the quantitative interpretation of YFP labeling kinetics in a target population work when populations upstream show opposite trends (e.g., naïve T cells increasing in YFP+ frequency but memory cells in effect decreasing, as, at the time of labeling, non-activated = non-proliferative T cells (and hence YFP-) might later become activated and contribute to memory)?

      These are good (and related) points. We've added some text to the discussion, paragraphs 2 and 3; we reproduce it here, slightly expanded.

      Fig 1B was a schematic but did faithfully reflect the impact of any waning of YFP in precursor on its kinetic in the targets. However, in our experiments, as you noted, the kinetics of YFP in most of the precursor populations were quite flat. This was due in part to memory subsets being sustained by the increasing levels of YFP within naïve cells from the cohort of thymocytes labeled during treatment. There is also likely some residual permanent labeling of lymphocyte progenitor populations. We discussed this in Lukas Front Imm 2023. (The latter is not a problem; all that matters for our analysis is that we generate a reasonable empirical description of the label kinetics in naive cells, however it arises). YFP is therefore not cleanly washed out in the periphery; and so for models with circulating memory as the tissue precursor, the flatness of their YFP curves leads to rather flat curves in the tissues.

      The mTom labelling was more informative as it was clearly diluted out of all peripheral populations by mTom-negative descendants of thymically-derived cells, as you point out in (a).

      Regarding (2), re: interpreting the initial levels of labels in precursors and targets. The important point here is that YFP and mTom were induced quickly in all populations we studied; therefore our inferences regarding precursors and targets aren’t informed by the initial levels of levels in each. (Imagine a slow precursor feeding a rapidly dividing target; YFP levels in the former would start lower than those in the latter). The causal issue that we think you’re referring to would matter if one expects the targets to begin with no label at all; for instance, in our busulfan chimeric mouse model (e.g. Hogan PNAS 2015) new, thymically derived ‘labelled’ (donor) cells progressively infiltrate replete ‘unlabelled’ (host) populations. In that case, one can immediately reject certain differentiation pathways by looking the sequence of accrual of donor cells in different subsets.

      The trends in YFP and mTom frequencies after treatment do matter for pathway inference, though, because precursor kinetics must leave an imprint on the target. For the case you mentioned, with opposite trends in label kinetics, such models would unlikely to be supported strongly; indeed, we never saw strong support for naïve cells (strongly increasing YFP) as a direct precursor of TRM (fairly flat).

      We’ve added a condensed version of this to the Discussion.

      (3) Please add a measure of variation (e.g., suitable credible intervals) to the "best fits" (solid lines in Figure 2).

      Added.

      (4) Could the authors better explain the motivation for basing their model comparisons on the Leave-OneOut (LOO) cross-validation method? Why not use Bayesian evidence instead?

      Bayes factors are very sensitive to priors and are either computationally unstable if calculated with importance sampling methods, or very expensive to calculate, if ones uses the more stable bridge sampling method. (We also note that fitting just a single model here takes a substantial amount of time). Further, using BF can be unreliable unless one of the models is close to the 'true' data generating model; though they seem to work well, we can be sure that none of our models are! For us, a more tractable and real-world selection criterion is based on the usefulness of a model, for which predictive performance is a reasonable proxy. In this case the mean out-of-sample prediction error (which LOO-IC reflects) is a wellestablished and valid means of ascribing support to different models.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review): 

      Summary: 

      Wang et al. identify Hamlet, a PR-containing transcription factor, as a master regulator of reproductive development in Drosophila. Specifically, the fusion between the gonad and genital disc is necessary for the development of continuous testes and seminal vesicle tissue essential for fertility. To do this, the authors generate novel Hamlet null mutants by CRISPR/Cas9 gene editing and characterize the morphological, physiological, and gene expression changes of the mutants using immunofluorescence, RNA-seq, cut-tag, and in-situ analysis. Thus, Hamlet is discovered to regulate a unique expression program, which includes Wnt2 and Tl, that is necessary for testis development and fertility. 

      Strengths: 

      This is a rigorous and comprehensive study that identifies the Hamlet-dependent gene expression program mediating reproductive development in Drosophila. The Hamlet transcription targets are further characterized by Gal4/UAS-RNAi confirming their role in reproductive development. Finally, the study points to a role for Wnt2 and Tl as well as other Hamlet transcriptionally regulated genes in epithelial tissue fusion. 

      We appreciate that the reviewer thinks our study is rigorous.

      Weaknesses: 

      The image resolution and presentation of figures is a major issue in this study. As a nonexpert, it is nearly impossible to see the morphological changes as described in the results. Quantification of all cell biological phenotypes is also lacking therefore reducing the impact of this study to those familiar with tissue fusion events in Drosophila development. 

      In the revised version, we have improved the image presentation and resolution. For all the images with more than two channels, we included single-channel images, changed the green color to lime and the red to magenta, highlighted the testis (TE) and seminal vescicles to make morphological changes more visible.  

      We had quantification for marker gene expression in the original version, and now also included quantification for cell biological phenotypes which are generally with 100% penetrance.  

      Reviewer #2 (Public review): 

      Strengths: 

      Wang and colleagues successfully uncovered an important function of the Drosophila PRDM16/PRDM3 homolog Hamlet (Ham) - a PR domain-containing transcription factor with known roles in the nervous system in Drosophila. To do so, they generated and analyzed new mutants lacking the PR domain, and also employed diverse preexisting tools. In doing so, they made a fascinating discovery: They found that PR-domain containing isoforms of ham are crucial in the intriguing development of the fly genital tract. Wang and colleagues found three distinct roles of Ham: (1) specifying the position of the testis terminal epithelium within the testis, (2) allowing normal shaping and growth of the anlagen of the seminal vesicles and paragonia and (3) enabling the crucial epithelial fusion between the seminal vesicle and the testis terminal epithelium. The mutant blocks fusion even if the parts are positioned correctly. The last finding is especially important, as there are few models allowing one to dissect the molecular underpinnings of heterotypic epithelial fusion in development. Their data suggest that they found a master regulator of this collective cell behavior. Further, they identified some of the cell biological players downstream of Ham, like for example E-Cadherin and Crumbs. In a holistic approach, they performed RNAseq and intersected them with the CUT&TAG-method, to find a comprehensive list of downstream factors directly regulated by Ham. Their function in the fusion process was validated by a tissue-specific RNAi screen. Meticulously, Wang and colleagues performed multiplexed in situ hybridization and analyzed different mutants, to gain a first understanding of the most important downstream pathways they characterized, which are Wnt2 and Toll. 

      This study pioneers a completely new system. It is a model for exploring a process crucial in morphogenesis across animal species, yet not well understood. Wang and colleagues not only identified a crucial regulator of heterotypic epithelial fusion but took on the considerable effort of meticulously pinning down functionally important downstream effectors by using many state-of-the-art methods. This is especially impressive, as the dissection of pupal genital discs before epithelial fusion is a time-consuming and difficult task. This promising work will be the foundation future studies build on, to further elucidate how this epithelial fusion works, for example on a cell biological and biomechanical level. 

      We appreciate that the reviewer thinks our study is orginal and important.

      Weaknesses: 

      The developing testis-genital disc system has many moving parts. Myotube migration was previously shown to be crucial for testis shape. This means, that there is the potential of non-tissue autonomous defects upon knockdown of genes in the genital disc or the terminal epithelium, affecting myotube behavior which in turn affects fusion, as myotubes might create the first "bridge" bringing the epithelia together. The authors clearly showed that their driver tools do not cause expression in myoblasts/myotubes, but this does not exclude non-tissue autonomous defects in their RNAi screen. Nevertheless, this is outside the scope of this work. 

      We thank the reviewer’s consideration of non-tissue autonomous defects upon gene knockdown. The driver, hamRSGal4, drives reporter gene expression mainly in the RS epithelia, but we did observe weak expression of the reporter in the myoblasts before they differentiate into myotubes. Thus, we could not rule out a non-tissue autonomou effect in the RNAi screen. So we now included a statement in the result, “Given that the hamRSGal4 driver is highly expressed in the TE and SV epithelia, we expect highly effective knockdown occurs only in these epithelial cells. However, hamRSGal4 also drives weak expression in the myoblasts before they differentiated into myotubes (Supplementary Fig. 5B), which may result in a non-tissue autonomous effect when knocking down the candidate genes expressed in myoblasts.”

      However, one point that could be addressed in this study: the RNAseq and CUT&TAG experiments would profit from adding principal component analyses, elucidating similarities and differences of the diverse biological and technical replicates. 

      Thanks for the suggestion. We now have included the PCA analyses in supplementary figure 6A-B and the corresponding description in the text. The PCA graphs validated the consistency between biological replicates of the RNA-seq samples. The Cut&Tag graphs confirm the consistency between the two biological replicates from the GFP samples, but show a higher variability between the w1118 replicates. Importantly, we only considered the overlapped peaks pulled by the GFP antibody from the ham_GFP genotype and the Ham antibody from the wildtype (w1118) sample as true Ham binding sites. 

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors): 

      Major Concern: 

      (1) The image resolution and presentation of figures (Figures 2, 5, 6, and 7) is a major issue in this study. As a non-expert, it is nearly impossible to see the morphological changes as described in the results. Images need to be captured at higher resolution and zoomed in with arrows denoting changes as described. Individual channels, particularly for intensity measurement need to be shown in black and white in addition to merged images. Images also need pseudo-colored for color-blind individuals (i.e. no red-green staining). 

      The images were captured at a high resolution, but somehow the resolution was drammaticlly reduced in the BioRxiv PDF. We try to overcome this by directly submitting the PDF in the Elife submission system. In the revised version, we have included single-channel images, changed the green and red colors to lime and magenta for color blindness. We also highlighted the testis (TE) and seminal vescicle structures in the images to make morphological changes more visible.  

      (2) The penetrance of morphological changes observed in RT development is also unclear and needs to be rigorously quantified for data in Figures 2, 5, and 7. 

      We now included quantification for cell biological phenotypes which are generally with 100% penetrance. The percentage of the penetrance and the number of animals used are indicated in each corresponding image.  

      Reviewer #2 (Recommendations for the authors): 

      Major Points 

      (1) Lines 193- 220 I would strongly suggest pointing out the obvious shape defects of the testes visible in Figure 2A ("Spheres" instead of "Spirals"). These are probably a direct consequence of a lack in the epithelial connection that myotubes require to migrate onto the testis (in a normal way) as depicted in the cartoons, allowing the testis to adopt a spiral shape through myotube-sculpting (Bischoff et al., 2021), further confirming the authors' findings! 

      Good point. In the revised text, we have added more description of the testis shape defects and pointed out a potential contribution from compromised myotube migration.   

      (2) Line 216: "Often separated from each other". Here it would be important to mention how often. If the authors cannot quantify that from existing data, I suggest carrying it out in adult/pharate adult genital tracts (if there is no strong survivor bias due to the lethality of stronger affected animals), as this is much easier than timing prepupae. This should be a quick and easy experiment. 

      Because it is hard to tell whether the separation of the SV and TE was caused by developmental defects or sometimes could be due to technical issues (bad dissection), we now change the description to, “control animals always showed connected TE and SV, whereas ham mutant TE and SV tissues were either separated from each other, or appeared contacted but with the epithelial tubes being discontinuous (Fig. 2B).” Additionally, we quantified the disconnection phenotype, which is 100% penetrance in 18 mutant animals. This quantification is now included in the figure. 

      (3) Lines 289-305, Figure 3. I could only find how many replicates were analyzed in the RNAseq/CUT&Tag experiments in the Material & Methods section. I would add that at least in the figure legends, and perhaps even in the main text. Most importantly, I would add a Principal Component Analysis (one for RNAseq and one for the CUT&TAG experiment), to demonstrate the similarity of biological replicates (3x RNaseq, 4x Cut&Tag) but also of the technical replicates (RNAseq: wt & wt/dg, ham/ham & ham/df, GD & TE; CUT&TAG: Antibody & GFP-Antibody, TG&TE...). This should be very easy with the existing data, and clearly demonstrate similarities & differences in the different types of replicates and conditions. 

      Principle component analysis and its description are now added to Supplementary Fig 6 and the main text respectively. 

      (4) Line 321; Supplementary Table 1: In the table, I cannot find which genes are down- or upregulated - something that I think is very important. I would add that, and remove the "color" column, which does not add any useful information. 

      In Supplementary table 1, the first sheet includes upregulated genes while the second sheet includes downregulated genes. We removed the column “color” as suggested.  

      (5) Line 409: SCRINSHOT was carried out with candidate genes from the screen. One gene I could not find in that list was the potential microtubule-actin crosslinker shot. If shot knockdown caused a phenotype, then I would clearly mention and show it. If not, I would mention why a shot is important, nonetheless. 

      shot is one of the candidate target genes selected from our RNA-seq and Cut&Tag data. However, in the RNAi screen, knocking down shot with the available RNAi lines did not cause any obvious phenotype. These could be due to inefficient RNAi knockdown or redundancy with other factors. We anyway wanted to examine shot expression pattern in the developing RS, give the important role of shot in epithelial fusion (Lee S., 2002). Using SCRINSHOT, we could detect epithelial-specific expression of shot, implying its potential function in this context. We now revised the text to clarify this point. 

      Minor points 

      (1) Cartoons in Figure 1: The cartoons look like they were inspired by the cartoon from Kozopas et al., 1998 Fig. 10 or Rothenbusch-Fender et al., 2016 Fig 1. I think the manuscript would greatly profit from better cartoons, that are closer to what the tissue really looks like (see Figure 1H, 2G), to allow people to understand the somewhat complicated architecture. The anlagen of the seminal vesicles/paragonia looks like a butterfly with a high columnar epithelium with a visible separation between paragonia/seminal vesicles (upper/lower "wing" of the "butterfly"). Descriptions like "unseparated" paragonia/seminal vesicle anlagen, would be much easier to understand if the cartoons would for example reflect this separation. It would even be better to add cartoons of the phenotypic classes too, and to put them right next to the micrographs. (Another nitpick with the cartoons: pigment cells are drastically larger and fewer in number (See: Bischoff et al., 2021 Figure 1E & MovieM1).) 

      Thanks for the suggestion. We have updated Figure 1 by adding additional illustrations showing the accessory gland and seminal vesicle structures in the pupal stage and changing the size of pigment cells.

      (2) Line 95-121 I would also briefly introduce PR domains, here. 

      We have added a brief descripition of the PR domains.

      (3) Line 152, 158, 160, 162. When first reading it, I was a bit confused by the usage of the word sensory organ. I would at least introduce that bristles are also known as external mechanosensory organs. 

      We have now revised the description to “mechano-sensory organ”.

      eg. Line 184, 194, and many more. Most times, the authors call testis muscle precursors "myoblasts". This is correct sometimes, but only when referring to the stage before myoblast-fusion, which takes place directly before epithelial fusion (28 h APF). Postmyoblast-fusion (eg. during migration onto the testis), these cells should be called myotubes or nascent myotubes, as the fly muscle community defined the term myoblast as the singlenuclei precursors to myotubes. 

      We have now revised the description accordingly.  

      (4) Line 217/Figure 2B. It looks like there is a myotube bridge between the testis and the genital disc. I would point that out if it's true. If the authors have a larger z-stack of this connection, I suggest creating an MIP, and checking if there are little clusters of two/three/four nuclei packed together. This would clearly show that the cells in between are indeed myotubes (granted that loss of ham does not introduce myoblast-fusion-defects). 

      We do not have a Z-stack of this connection, and thus can not confirm whether the cells in this image are myotubes. However, we found that mytubes can migrate onto the testis and form the muscular sheet in the ham mutant despite reduced myotube density. At the junction there are myotubes, suggesting that loss of ham does not introduce myoblast-fusion defects. These results are now included in the revised manuscript, supplementary Fig. 5 C-D.

      (5) Line 231/Supplementary Fig. 3C-G: I would add to the cartoons, where the different markers are expressed. 

      We have added marker gene expression in the cartoons.

      (6) Line 239. I don't see what Figure 1A/1H refers to, here. I would perhaps just remove it. 

      Yes, we have removed it.

      (7) Line 232. I would rephrase the beginning of the sentence to: Our data suggest Ham to be... 

      Yes, we have revised it.

      (8) Line 248-250/Figure 2F. Clonal analyses are great, but I think single channels should be shown in black and white. Also, a version without the white dashed line should be shown, to clearly see the differences between wt and ham-mutant cells. 

      Now single channel images from the green and red images are presented in Supplementary Figures. This particular one is in Supplementary Figure 3B. 

      (9) Line 490. The Toll-9 phenotype was identified on the sterility effect/lack-of-spermphenotype alone, and it was deduced, that this suggests connection defects. By showing the right focus plane in Fig S8B (lower right), it should be easy to directly show whether there is a connection defect or not. Also, one would expect clearer testis-shaping defects, like in ham-mutants, as a loss of connection should also affect myotube migration to shape the testis. This is just a minor point, as it only affects supplementary data with no larger impact on the overall findings, even if Toll-9 is shown not to have a defect, after all. 

      We find that scoring defects at the junction site at the adult stage is difficult and may not be always accurate. Instead, we score the presence of sperms in the SV, which indirectly but firmly suggests successful connection between the TE and SV. We have now included a quantification graph, showing the penetrance of the phentoype in the new Supplementary Fig.14C. There were indeed morphological defects of TE in Toll-9 RNAi animals. We now included the image and quantification in the new Supplementary Fig.14B.

    1. Author response:

      The following is the authors’ response to the original reviews

      Response to the public reviews:

      We are very pleased to see these positive reviews of our preprint.

      Reviewers 1 and 3 raise issues around PIP-PP1 interactions.

      (1) Role of the “RVxF-ΦΦ-R-W string”

      Most PIPs interact with the globular PP1 catalytic core through short linear interaction motifs (SLiMs) and Choy et al (PNAS 2014) previously showed that many PIPs interact with PP1 through conserved trio of SLiMs, RVxF-ΦΦ-R, which is also present in the Phactrs.

      Previous structural analysis showed the trajectory of the PPP1R15A/B, Neurabin/Spinphilin (PPP1R9A/B), and PNUTS (PPP1R10) PIPs across the PP1 surface encompasses not only the RVxF-ΦΦ-R trio, but also additional sequences C-terminal to it (Chen et al, eLife, 2015). This extended trajectory is maintained in the Phactr1-PP1 complex (Fedoryshchak et al, eLife (2020). Based on structural alignment we proposed the existence of an additional hydrophobic “W” SLiM that interacts with the PP1 residues I133 and Y134.

      The extended “RVxF-ΦΦ-R-W” interaction brings sequences C-terminal to the “W” SLiM into the vicinity of the hydrophobic groove that adjoins the PP1 catalytic centre. In the Phactr1/PP1 complex, these sequences remodel the groove, generating a novel pocket that facilitates sequence-specific substrate recognition.

      This raises the possibility that sequences C-terminal to the extended “RVxF-ΦΦ-R-W string” in the other complexes also confer sequence-specific substrate recognition, and our study aims to test this hypothesis. Indeed, the hydrophobic groove structures of the Neurabin/Spinophilin/PP1 and Phactr1/PP1 complexes differ significantly (Ragusa et al, 2010; see Fedoryshchak et al 2020, Fig2 FigSupp1).

      (2) Orientation of the W side chain

      Reviewer 1 points out that in the substrate-bound PP1/PPP1R15A/Actin/eIF2 pre-dephosphorylation complex the W sidechain is inverted with respect to its orientation in  PP1-PPP1R15B complex (Yan et al, NSMB 2021). The authors proposed that this may reflect the role of actin in assembly of the quaternary complex. This does not necessarily invalidate the notion that sequences C-terminal to the “W” motif might play a role in actin-independent substrate recognition, and we therefore consider our inclusion of the R15A/B fusions in our analysis to be reasonable.

      (3) Conservation of W

      The motif ‘W’ does not mandate tryptophan - Phactrs and PPP1R15A/B indeed have W at this position but Neurabin/spinophilin contain VDP, which makes similar interactions. Similarly the “RVxF” motifs in Phactr1, Neurabin/Spinophilin, PPP1R15A/B and PNUTS are LIRF, KIKF, KV(R/T)F and TVTW respectively.

      In our revision, we will present comparisons of the differentially remodelled/modified PP1 hydrophobic groove in the various complexes, discuss the different orientations of the tryptophan in the previously published PPP1R15A/PP1 and PPP1R15B/PP1 structures. We will also address the other issues raised by the referees.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Comments and suggestions for revisions

      (1) The authors do not provide strong evidence that the interactions of the 'W' of the RVxF- øø -R-W string with the hydrophobic groove of PP1 is conserved in PIPs. Whereas the RVxF motif is well conserved and validated since its discovery in 1997, as are the øø - (an extension of the RVxF motif), and the 'R', the conservation of the Trp residue in the RVxF-øø-R-W string is not conserved.

      We did not mean to imply that the W motif is conserved amongst all PIPs.

      Most PIPs interact with the globular PP1 catalytic core through short linear interaction motifs (SLiMs). Choy et al (PNAS 2014) previously showed that many PIPs interact with PP1 through a conserved trio of SLiMs, RVxF-ΦΦ-R, which is also present in the Phactrs.

      Previous structural analysis showed that the PPP1R15A/B, Neurabin/Spinophilin (PPP1R9A/B), and PNUTS (PPP1R10) PIPs share a trajectory across the PP1 surface that encompasses not only the RVxF-ΦΦ-R SLIMs, but also additional sequences C-terminal to the R SLIM (Chen et al, eLife, 2015). This trajectory is also shared by the Phactr1-PP1 complex (Fedoryshchak et al, eLife, 2020). Based on this structural alignment we proposed the existence of an additional hydrophobic “W” SLiM that interacts with the PP1 residues I133 and Y134 (See Fedoryshchak et al, 2020, Figure 1 figure supplement 2).

      Introduction, paragraph 2 is rewritten to make this clearer.

      The sequence and positions of W differ in amino acid type and position relative to the RVxF-øø-R string.

      The motif ‘W’ does not mandate tryptophan, it is our name for a common structurally aligned motif: although the Phactrs and PPP1R15A/B indeed have W at this position, Neurabin and spinophilin contain VDP, which nevertheless makes similar interactions. Similarly the _“_RVxF” motifs in Phactr1, Neurabin/Spinophilin, PPP1R15A/B and PNUTS are LIRF, KIKF, KV(R/T)F and TVTW respectively.

      In the Discussion the authors state that the hydrophobic groove of PP1 is remodelled by Neurabin. However, details of this are not described or shown in the manuscript.

      The shared trajectory determined by the RVxF-øø-R-W string brings the sequences C-terminal to the W SLIM into the vicinity of the PP1 hydrophobic groove. In the Phactr1/PP1 holoenzyme this generates a novel pocket required for substrate recognition (Fedoryshchak et al, 2020). These observations raised the possibility that sequences C-terminal to the “W” motif in the other RVxF-øø-R-W PIPs also play a role in substrate recognition.

      Introduction paragraph 3 now cites a new Figure 1-S2, which shows how the hydrophobic groove is remodelled in the various different PIP/PP1 complexes. A revised Figure 1A now indicates the hydrophobic residues defining the hydrophobic groove by grey shading.

      (2) To add to the confidence of the structure, the authors should include a 2Fo-Fc simulated annealing omit map, perhaps showing the R and W interactions of the RVxF-øø-R-W string.

      This is now included as new Figure 6 Figure supplement 1. Note that in Neurabin, the W motif is VDP, where the valine and proline sidechains interact similarly to the tryptophan (see also new Figure 1-S2G,H).

      We also add a new supplementary Figure 6-S1 comparing our PBM-liganded Neurabin PDZ domain with the previously published unliganded structure (Ragusa et al 2010).

      (3) Page 16. The authors state that spinophilin remodels the PP1 hydrophobic groove differently from Phactrs. Arguably spinophilin does not remodel the PP1 hydrophobic groove at all. There are no contacts between spinophilin and the PP1 hydrophobic groove in the spinophilin-PP1 structure, correlating with the absence of 'W" in the RVxF-øø-R-W string in spinophilin.

      The VDP sequence corresponding to the W motif in spinophilin and neurabin makes analogous contacts to those made by the W in Phactr1 (see Fedoryshchak et al 2020).

      Remodelling is meant in the sense of altering the structure of the major groove by bringing new sequences into its vicinity rather than necessarily directly interacting with it. The spinophilin/PP1 and Phactr/PP1 hydrophobic grooves are compared in new Figure 1-S2 (see also Fedoryshchak et al 2020, Figure 2 figure supplement 1)

      (4) Page 8. For the cell-based/proteomics-dephosphorylation assay in Figure 2, it isn't clear why there were no dephosphorylation sites detected for the PPP1R15A/B-PP1 fusion (except PPP6R1 S531 for PPP1R15B). One might have expected a correlation with PP1 alone. Does this imply that PPP1R15A/B are inhibiting PP1 catalytic activity? Was the activity tested in vitro?

      The R15A/B data are compared to average abundance of all the phosphosites in the dataset, including those of PP1.

      We have not tested for a general inhibitory effect of R15A/B on PP1 activity. Many PIPs including R15A/B do occlude one or more of the PP1 substrate groove and therefore generally act as inhibitors of PP1 activity against some potential substrates, while enhancing activities against others.

      Other points 

      (4) Figure S1: Colour sequence similarities/identities.

      Done

      (6) Figures: Structure figures lacked labels:

      Figure 1A, label PP1, Phactrs etc.

      Done

      Figure 6, label PP1, Neurabin, previous Neurabin structure (Fig. 6C), hydrophobic groove, PDZ domain, etc.

      Done

      (7) Statistical analysis. p values should be shown for data in:

      Figure 5.

      To avoid cluttering the Figure, a new sheet, “statistical significance” has been added to Supplementary Table 3, summarizing the analysis.

      Figure 1.

      Figure amended (now figure 1-S1).

      (8) Some inconsistency with labels, eg '34-WT' used in Fig. 5C, whereas '34A-WT' (better) in Methods.

      Now changed to 34A etc where used.

      (9) Page 6. PPP1R9A/B is not shown in Figure 1A and Figure S1A.

      PPP1R9A/B are Neurabin and spinophilin - now clarified in Introduction paragraph 2, Results paragraph 1, Discussion paragraph 1.

      (10) Page 7: lines 4, 'site' not 'side'.

      Done

      (11) Page 9: DTL and CAMSAP3 were found to be dephosphorylated in the PP1-Neurabin/spinophilin screen. Are these PDZ-binding proteins?

      Neither DTL nor CAMSAP3 contain C-terminal hydrophobic residues characteristic of classical PBMs. Sentence added in Discussion, paragraph 5

      (12) Page 12 and Figure 5 and S5: The synthetic p4E-BP1 and IRSp53WT peptides with PBM should be given more specific names to indicate the presence of the PBM.

      We have renamed 4E-BP1<sup>WT</sup> and IRSp53<sup>WT</sup> to 4E-BP1<sup>PBM</sup> and  IRSp53<sup>PBM</sup> respectively, emphasising the inclusion of the wildtype or mutated PBM from 4E-BP1 on these peptides.

      Text, Figure 5, and Figure S5 all revised accordingly.

      (13) Give PDB code for spinophilin-PP1 complex coordinates shown in Figure 6C.

      PDB codes for the various PIP/PP1 complexes now given in new Figure 1-S2 and revised Figure 6C.

      Reviewer #2 (Recommendations for the authors):

      The work undertaken by the authors is extensive and robust, however, I believe that some improvement in the writing and some detailed explanation of certain results sections would help with the presentation of the work and clarity for the readers.

      (1) The introduction should contain more information about the interaction between PP1 and Neurabin, given that this is the focus of the paper. This would give the reader the necessary background required to follow the paper.

      Introduction paragraph 2 revised to describe the different SLIMs in more detail. New Figure 1-S2 shows detail of the different remodelled hydrophobic grooves in the various PIP/PP1 complexes.

      (2) More information on PP1-IRSp53L460A has to be added before discussing results in S1B.

      Sentence explaining that IRSp53 L460 docks with the remodelled PP1 hydrophobic groove in the Phactr1/PP1 holoenzyme added in Results paragraph 2.

      (3) Page 6: "as expected, the +5 residue L460A mutation, which impairs dephosphorylation by the intact Phactr1/PP1 holoenzyme, impaired sensitivity to all the fusions, indicating that they recognise phosphorylated IRSp53 in a similar way (Figure S1B)". Statistics between IRSp53 and IRSp53L460A across PP1-PIPs need to be conducted before concluding the above. From the graph and the images, the impairment to dephosphorylation is not convincing.

      For each of the four PP1-Phactr fusions, the IRSp53 L460A peptide shows significantly less reactivity than the IRSp53WT peptide (p<0.05 for each fusion).

      Since the proteomics studes in Figure 2 show that the substrate specificity of the four PP1-Phactr1 fusions is virtually identical, we combined the data for the four different fusions. The IRSp53 L460A peptide shows significantly less reactivity than the IRSp53WT peptide in this analysis (p< 0.0001). This result shown in revised Figure S1B and legend.

      (4) mCherry-4E-BP1(118+A), in which an additional C-terminal alanine should still allow TOSmediated phosphorylation, but prevent PDZ interaction. Does 4EBP1 (118+A) actually prevent interaction between PP1-Neurabin? This interaction needs to be validated, especially since spinophilin was shown to bind to multiple regions of PP1.

      It is not clear what the referee is asking for here. The biochemical analysis in Figure 4C shows that the C-terminus of 4E-BP1 constitutes a classical PBM. The X-ray crystallography in Figure 6 confirms this, demonstrating H-bond interactions between the 4E-BP1 C-terminal carboxylate and main chain amides of L514, G515 and I516.

      We consider the possibility that the 4E-BP1(118+A) mutant inhibits the activity of PP1-neurabin via a mechanism other than direct blocking 4E-BP1 / PDZ interaction to be unlikely for the following reasons:

      (1) Addition of a C-terminal alanine will disrupt the PBM interaction because the extra residue sterically blocks access to the PBM-binding groove. This is the most parsimonious explanation, and is based on our solid structural and biochemical evidence that the 4E-BP1 C-terminus is a classical PBM.

      (2) Alphafold3 modelling predicts Neurabin PDZ / 4E-BP1 PBM interaction with high confidence (shown in Figure 6-S2E), but it does not predict any PDZ interaction with 4E-BP1(118+A). Note added in Figure 6-S2 legend.

      (3) Recognition of the 4E-BP1(118+A) mutation without loss of binding affinity would require that the mutant becapable of binding formally equivalent to recognition of an “internal” PDZ-binding peptide. Recognition of such “internal peptides” is dependent on their adopting a specifically constrained conformation, which typically requires reorganisation of the PDZ carboxylate-binding GLGF loop. Such “internal site” recognition typically involves more than one residue C-terminal to the conventional PDZ “0” position (see Penkert et al NSMB 2004, doi:10.1038/nsmb839; Gee et al JBC 1998, DOI: 10.1074/jbc.273.34.21980; Hillier et al 1999, Science PMID: 10221915).

      (5) It is nice to see that the various PP1-Phactr fusions have around 60% substrate overlap between them. Would it be possible to compare these results with previously published mass spec data of Phactr1XXX from the group? There is mention of some substrates being picked up, but a comparison much like in Figure 2E would be more informative about the extent to which the described method captures relevant information.

      This is difficult to do directly as the PP1-Phactr fusion data are from human cells while that in Fedoryshchak et al 2020 is from mouse.

      However, manual curation shows that of the 28 top hits seen in our previous analysis of Phactr1XXX in NIH3T3 cells, 18 were also detectable in the HEK293 system; of these, 13 were also detected as as PP1-Phactr fusion hits. Data summarised in new Figure 2-S1C. Text amended in Results, “Proteomic analysis...”, paragraph 2.

      (6) Figure 3D Why are the levels of pT70, pT37/46 and total protein in vector controls much lower as compared to 0nM Tet in PP1-Neurabin conditions? It is also weird that given total protein is so low, why are the pS65/101 levels high compared to the rest?

      We think it likely these phenomena reflect a low level expression of PP1-Neurabin expression in uninduced cells. Now noted in Figure 3D legend, basal PP1-Neurabin expression shown in new Figure 3-S1C. This alters the relative levels of the different species detected by the total 4E-BP1 antibody in favour of the faster migrating forms, which are less phosphorylated than the slower ones, and the total amount increases about 2-fold (Figure 3D, compare 0nM Tet lanes).

      The altered p65/101-pT70 ratio is also likely to reflect the leaky PP1-Neurabin expression, since the relative intensities of the various phosphorylated species are dependent on both the relative rates of phosphorylation and dephosphorylation. Expression of a phosphatase would therefore be expected to differentially affect the phosphorlyation levels of different sites according to their reactivity.

      (7) Figure 3E: Does inhibiting mTORC further reduce translation when PP1-Neurabin is expressed? If this is the case, this might suggest that they might not necessarily be mTORC inhibitors?

      We have not done this experiment. Since Rapamycin cannot be guaranteed to completely block 4E-BP1 phosphorylation, and PP1-Neurabin cannot be guaranteed to completely dephosphorylate 4E-BP1, any further reduction upon their combination would be hard to interpret.

      (8) Substrate interactions with the remodelled PP1 hydrophobic groove do not affect PP1-Neurabin specificity. Is there evidence that PP1-Neurabin remodels the hydrophobic groove? Is it not possible that Neurabin does not remodel the PP1 groove to begin with and hence there is no effect observed with the various mutants? If this is not the case, it should be explained in a bit more detail.

      Comparison of the Neurabin/PP1 and Phactr1/PP1 structures shows that the hydrophobic groove is remodelled differently in the two complexes. Now shown in new Figure 1-S2B,C,G.

      (9) Figure 5B has a lot of interesting information, which I believe has not been discussed at all in the results section.

      To help interpretation of the enzymology in Figure 5 we have renamed 4E-BP1WT and IRSp53WT to 4E-BP1PBM and IRSp53PBM respectively, emphasising the inclusion of the wildtype or mutated PBM from 4E-BP1 on these peptides. Text in Results, “PDZ domain interaction…”, paragraph 1, and Figures 5 and S5 revised accordingly.

      Why does the 4E-BP1Mut affect catalytic efficiency of PP1 alone when compared with WT, while no difference is observed with IRSp53WT and mutant?

      We do not understand the basis for the differential reactivity of 4E-BP1PBM and 4E-BP1MUT with PP1 alone; we suspect that it reflects the hydrophobicity change resulting from the MDI -> SGS substitution. However this is unlikely to be biologically significant as PP1 is sequestered in PIP-PP1 complexes.

      Importantly, the two PP1 fusion proteins behave consistently in this assay – the presence of the intact PBM increases reactivity with PP1-Neurabin, but has no effect on dephosphorylation by PP1-Phactr1.

      Why does PP1 alone not have a difference between IRSp53WT and mutant, while PP1-Neurabin does have a difference?

      This is due to the presence of the PBM in IRSp53WT (now renamed IRSp53PBM), which affects increases affinity for PP1 Neurabin, but not PP1 alone. Likewise, PP1-Phactr1, which does not possess a PDZ domain, is also unaffected by the integrity of the PBM.

      (7) “Strikingly, alanine substitutions at +1 and +2 in 4E-BP1WT increased catalytic efficiency by both fusions, perhaps reflecting changes at the catalytic site itself (Figure 5E, Figure S5E)”. This could be expanded upon, because this suggests a mechanism that makes the substrate refractory to PDZ/hydrophobic groove remodelling?

      We favour the idea that this reflects a requirement to balance dephosphorylation rates between the multiple 4E-BP1 phosphorylation sites, especially if multiple rounds of dephosphorylation occur for each PBM—PDZ interaction. Additional sentences added in Discussion paragraph 7.

      (8) Typographical errors and minor comments:

      a) PIPs can target PP1 to specific subcellular locations, and control substrate specificity through autonomous substrate-binding domains, occupation or extension of the substrate grooves, or modification of PP1 surface electrostatics.

      b) Phosphophorylation side site abundances within triplicate samples from the same cell line were comparable between replicates (Figure 2B).

      c) While the alanine substitutions had little effect, conversion of +4 to +6 to the IRSp534E-BP1 sequence LLD increased catalytic efficiency some 20-fold (Figure 5C, Figure S5C). 

      d) Figure 3E labels are not clear. The graph can be widened to make the labels of the conditions clearer.

      All corrected

      Reviewer #3 (Recommendations for the authors):

      This was a very well-written manuscript.

      However, I was looking for a summary mechanistic figure or cartoon to help me navigate the results.

      I noted a few typos in the text.

      New summary Figure 5-S2 added, cited in results, and discussed in Discussion paragraph 6,7.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This article presents a meta-analysis that challenges established abundance-occupancy relationships (AORs) by utilizing the largest known bird observation database. The analysis yields contentious outcomes, raising the question of whether these findings could potentially refute AORs.

      We thank the Reviewer for their positive comments.

      Strengths:

      The study employed an extensive aggregation of datasets to date to scrutinize the abundance-occupancy relationships (AORs).

      We thank the Reviewer for their positive comments.

      Weaknesses:

      While the dataset employed in this research holds promise, a rigorous justification of the core assumptions underpinning the analytical framework is inadequate. The authors should thoroughly address the correlation between checklist data and global range data, ensuring that the foundational assumptions and potential confounding factors are explicitly examined and articulated within the study's context.

      We thank the Reviewer for these comments. We agree that more justification and transparency is needed of the core assumptions that form the foundation of our methods. In our revised version, we have taken the following steps to achieve this:

      - Altered the title to be more explicit about the core assumptions, which now reads: “Local-scale relative abundance is decoupled from global range size”

      - We have added more details on why and how we treat global range size as a measure of ‘occupancy.’

      - We have added a section that discusses the limitations of using eBird relative abundance

      Reviewer #2 (Public Review):

      Summary:

      The goal is to ask if common species when studied across their range tend to have larger ranges in total. To do this the authors examined a very large citizen science database which gives estimates of numbers, and correlated that with the total range size, available from Birdlife. The average correlation is positive but close to zero, and the distribution around zero is also narrow, leading to the conclusion that, even if applicable in some cases, there is no evidence for consistent trends in one or other direction.

      We thank the Reviewer for these comments.

      Strengths:

      The study raises a dormant question, with a large dataset.

      We thank the Reviewer for these comments. We intended to take a longstanding question and attempt to apply novel datasets that were not available mere decades ago. While we do not imply that we have ‘solved’ the question, we hope this work highlights the potential for further interrogation using these large datasets.

      Weaknesses:

      This study combines information from across the whole world, with many different habitats, taxa, and observations, which surely leads to a quite heterogeneous collection.

      We agree that there is a heterogeneous collection of data across many habitats, taxa, and observations. However, rather than as a weakness, we see this as a significant strength. Our work assumes we are averaging over this variability to assess for a large-scale pattern in the relationship - something that was potentially a limitation of previous work, as these large datasets were often focused on particular contexts (e.g., much work focused solely on the UK), which we believe could limit some of the generalizability of the previous work. However, the reviewer makes a fair point in regard to the heterogeneity of data collection. We have now added some text in the discussion which is explicit about this - see the new section named “Potential limitations of current work and future work –-although our findings challenge some long-held assumptions about the consistency of the abundance-occupancy relationship, our work only deals with interspecific AORs among birds, synthesizing observations of potentially heterogeneous locations, context and quality”.

      First, scale. Many of the earlier analyses were within smaller areas, and for example, ranges are not obviously bounded by a physical barrier. I assume this study is only looking at breeding ranges; that should be stated, as 40% of all bird species migrate, and winter limitation of populations is important. Also are abundances only breeding abundances or are they measured through the year? Are alien distributions removed?

      Second, consider various reasons why abundance and range size may be correlated (sometimes positively and sometimes negatively) at large scales. Combining studies across such a large diversity of ecological situations seems to create many possibilities to miss interesting patterns. For example:

      (1) Islands are small and often show density release.

      See comment below.

      (2) North temperate regions have large ranges (Rapoport's rule) and higher population sizes than the tropics.

      See comment below.

      (3) Body size correlates with global range size (I am unsure if this has recently been tested but is present in older papers) and with density. For example, cosmopolitan species (barn owl, osprey, peregrine) are relatively large and relatively rare.

      See comment below.

      (4) In the consideration of alien species, it certainly looks to me as if the law is followed, with pigeon, starling, and sparrow both common and widely distributed. I guess one needs to make some sort of statement about anthropogenic influences, given the dramatic changes in both populations and environments over the past 50 years.

      See comment below. We also added a sentence in the methods that highlighted we did not remove alien ranges and provided reasons why. Still, we do acknowledge the dramatic changes in populations and environments over the past 50 years (see the new section  “Potential limitations of current work and futur work”)

      (5) Wing shape correlates with ecological niche and range size (e.g. White, American Naturalist). Aerial foraging species with pointed wings are likely to be easily detected, and several have large ranges reflecting dispersal (e.g. barn swallow).

      We agree that all of the points above are interesting data explorations. As said above, our main purpose was to highlight the potential for further interrogation using these large datasets. However, we have added some additional text in the discussion that explicitly mentions/encourages these additional data explorations. We hope people will pick up on the potential for these data and explore them further.

      Third, biases. I am not conversant with ebird methodology, but the number appearing on checklists seems a very poor estimate of local abundance. As noted in the paper, common species may be underestimated in their abundance. Flocking species must generate large numbers, skulking species few. The survey is often likely to be in areas favorable to some species and not others. The alternative approach in the paper comes from an earlier study, based on ebird but then creating densities within grids and surely comes with similar issues.

      We agree that if we were interested in the absolute abundance of a given species, the local number on an eBird checklist would be a poor representation. However, our study aims not to estimate absolute abundance but to examine relative abundance among species on each checklist. By focusing on relative abundance, we leverage eBird data's strengths in detecting the presence and frequency of species across diverse locations and times, thereby capturing community composition trends that can provide meaningful insights despite individual checklist biases. This approach allows us to assess the comparative prominence of species in the community as reported by the observer, providing a consistent metric of relative abundance. Despite detectability biases, the structure of eBird checklists reflects the observer’s encounter rates with each species under similar conditions, offering a valuable snapshot of relative species composition across sites and times. The key to our assumption is that these biases discussed are not directional and, therefore, random throughout the sampling process, which would translate to no ‘real’ bias in our effect size of interest.

      Range biases are also present. Notably, tropical mountain-occupying species have range sizes overestimated because holes in the range are not generally accounted for (Ocampo-Peñuela et al., Nature Communications). These species are often quite rare, too.

      We thanks the reviewer for pointing to this issue and reference. We included a discussion on these biases in our limitations section and reference Ocampo-Peñuela et al. to emphasize the need for improved spatial resolution in range data for more accurate AOR assessments.”More precise range-size estimates would also improve the accuracy of AOR assessments, since species range data are often overestimated due to the failure to capture gaps in actual distributions ”

      Fourth, random error. Random error in ebird assessments is likely to be large, with differences among observers, seasons, days, and weather (e.g. Callaghan et al. 2021, PNAS). Range sizes also come with many errors, which is why occupancy is usually seen as the more appropriate measure.

      If we consider both range and abundance measurements to be subject to random error in any one species list, then the removal of all these errors will surely increase the correlation for that list (the covariance shouldn't change but the variances will decrease). I think (but am not sure) that this will affect the mean correlation because more of the positive correlations appear 'real' given the overall mean is positive. It will definitely affect the variance of the correlations; the low variance is one of the main points in the paper. A high variance would point to the operation of multiple mechanisms, some perhaps producing negative correlations (Blackburn et al. 2006).

      We agree random errors can affect estimates, but as we wrote above, random errors, regardless of magnitudes, would not bias estimates. After accounting for sampling error (a part of random errors), little variance is left to be explained as we have shown in the MS. This suggests that many of the random errors were part of the sampling errors. And this is where meta-analysis really shines.

      On P.80 it is stated: "Specifically, we can quantify how AOR will change in relation to increases in species richness and sampling duration, both of which are predicted to reduce the magnitude of AORs" I haven't checked the references that make this statement, but intuitively the opposite is expected? More species and longer durations should both increase the accuracy of the estimate, so removing them introduces more error? Perhaps dividing by an uncertain estimate introduces more error anyway. At any rate, the authors should explain the quoted statement in this paper.

      It would be of considerable interest to look at the extreme negative and extreme positive correlations: do they make any biological sense?

      Extremely high correlations would not make any biological sense if these observations were based on large sample sizes. However, as shown in Figure 2, all extreme correlations come from small sample sizes (i.e., low precision), as sampling theory expects (actually our Fig 2 a text-book example of the funnel shape). Therefore, we do not need to invoke any biological explanations here.

      Discussion:

      I can see how publication bias can affect meta-analyses (addressed in the Gaston et al. 2006 paper) but less easily see how confirmation bias can. It seems to me that some of the points made above must explain the difference between this study and Blackburn et al. 2006's strong result.

      We agree. Now, we extended an explanation of why confirmation bias could result in positive AOR. Yet, we point out confirmation bias is a very common phenomena which we cite relevant citations in the original MS. The only way to avoid confirmation bias is to conduct a study blind but this is not often possible in ecological work.

      “Meta-research on behavioural ecology identified 79 studies on nestmate recognition, 23 of which were conducted blind. Non-blind studies confirmed a hypothesis of no aggression towards nestmates nearly three times more often. It is possible that confirmation bias was at play in earlier AOR studies.”

      Certainly, AOR really does seem to be present in at least some cases (e.g. British breeding birds) and a discussion of individual cases would be valuable. Previous studies have also noted that there are at least some negative and some non-significant associations, and understanding the underlying causes is of great interest (e.g. Kotiaho et al. Biology Letters).

      We agree. And yes, we pointed out these in our introduction.

      Reviewer #3 (Public Review):

      Summary:

      This paper claims to overturn the longstanding abundance occupancy relationship.

      Strengths:

      (1) The above would be important if true.

      (2) The dataset is large.

      We have clarified this point by changing the title to emphasize that we do not suggest overturning AORs entirely but instead provide a refined view of the relationship at a global scale. Our results suggest a weaker and more context-dependent AOR than previously documented. We hope our revised title and additional clarifications in the text convey our intent to contribute to a more nuanced understanding rather than a whole overturning of the AOR framework.

      Weaknesses:

      (1) The authors are not really measuring the abundance-occupancy relationship (AOR). They are measuring abundance-range size. The AOR typically measures patches in a metapopulation, i.e. at a local scale. Range size is not an interchangeable notion with local occupancy.

      We have refined this in our revision to be more explicitly focused on global range size. However, we note that the classic paper by Bock and Richlefs (1983, Am Nat) also refers to global (species entire) range size in the context of the AOR. Importantly, Bock and Richlefs pointed out the importance of using species’ entire ranges; without such uses, there will be sampling artifacts creating positive AORs when using arbitrary geographical ranges, which were used in some studies of AORs. So we highlight that our work is well in line with the previous work, allowing us to question the longstanding macroecological work. One of the issues of AOR has been how to define occupancy and global range size, which provides a relatively ambiguous measure, which is why we used this measure.

      (2) Ebird is a poor dataset for this. The sampling unit is non-standard. So abundance can at best be estimated by controlling for sampling effort. Comparisons across space are also likely to be highly heterogenous. They also threw out checklists in which abundances were too high to be estimated (reported as "X"). As evidence of the biases in using eBird for this pattern, the North American Breeding Bird Survey, a very similar taxonomic and geographic scope but with a consistent sampling protocol across space does show clear support for the AOR.

      Yes, we agree the sampling unit is non-standard. However, this is a significant strength in that it samples across much heterogeneity (as discussed in response to Reviewer 2, above). We were interested in relative abundance and not direct absolute abundance per se, which is accurate, especially since we did control for sampling effort.

      We appreciate the reviewer’s attention to our data selection criteria. We excluded checklists containing ‘X’ entries to minimize biases in our abundance estimates. The 'X' notation is often used for the most common species, reflecting the observer's identification of presence without specifying a count. This approach was chosen to avoid disproportionately inflating presence data for these abundant species, which could distort the relative abundance calculations in our analysis. By excluding such checklists, we aimed to retain consistency and ensure that local abundance estimates were representative across all species on each checklist. We have revised our manuscript to clarify this methodological choice and hope this explanation addresses the reviewer’s concern. We modified our text in the methods to make the entries ‘X’ clearer (see the Method section).

      (3) In general, I wonder if a pattern demonstrated in thousands of data sets can be overturned by findings in one data set. It may be a big dataset but any biases in the dataset are repeated across all of those observations.

      Overturning a major conclusion requires careful work. This paper did not rise to this level.

      We appreciate the reviewer’s caution regarding broad conclusions based on a single dataset, even one as large as eBird. Our intention was not to definitively overturn the abundance-occupancy relationship (AOR) but to re-evaluate it with the most extensive and globally representative dataset currently available. We recognise that potential biases in citizen science data, such as observer variation, may influence our findings, and we have taken steps to address these in our methodology and limitations sections. We see this work as a contribution to an ongoing discourse, suggesting that AOR may be less universally consistent than previously believed, mainly when tested with large-scale citizen science data. We hope this study will encourage additional research that tests AORs using other expansive datasets and approaches, further refining our understanding of this classic macroecological relationship. However, we have left our broad message about instigating credible revolution and also re-examining ecological laws.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The investigation focuses solely on interspecific relationships among birds; thus, the extrapolation of these conclusions to broader ecological contexts requires further validation.

      We have now added this point to our new section: “Although our findings challenge some long-held assumptions about the consistency of the abundance-occupancy relationship, our work only deals with interspecific AORs among birds, so we hope this work serves as a foundation for further investigations that utilize such comprehensive datasets.”

      (2) The rationale for combining data from eBird - a platform predominantly representing individual observations from urban North America - with the more globally comprehensive BirdLife International database needs to be substantiated. The potential underrepresentation of global abundance in the eBird checklist data could introduce a sampling bias, undermining the foundational premises of AORs.

      We agree with the limitation of ebird sampling coverage, but it should not bias our results. In statistical definitions, bias is directional, and if not directional, it will become statistical noise, making it difficult to detect the signal. In fact, our meta-analyses adjust what statisticians call sampling bias and it is the strength of meta-analysis.

      (3) In the full mixed-effect model, checklist duration and sampling variance (inversely proportional to sample size N) are treated as fixed effects. However, these variables are likely to be negatively correlated, which could introduce multicollinearity, inflating standard errors and diminishing the statistical significance of other factors, such as the intercept. This calls into question the interpretation of insignificance in the results.

      Multicollinearity is an issue with sample sizes. For example, with small datasets, correlations of 0.5 could be an issue, and such an issue would usually show up as a large SE. We do not have such an issue with ~ 17 million data points. Please refer to this paper.

      Freckleton, Robert P. "Dealing with collinearity in behavioural and ecological data: model averaging and the problems of measurement error." Behavioral Ecology and Sociobiology 65 (2011): 91-101.

      (4) The observed low heterogeneity may stem from discrepancies in sampling for abundance versus occupancy, compounded by uncertainties in reporting behavior.

      If we assume everybody underreports common species or overreports rare species, this could happen. However, such an assumption is unlikely. If some people report accurately (but not others), we should see high heterogeneity, which we do not observe).  We have touched upon this point in our original MS.

      (5) The contribution and implementation of phylogenetic comparative analysis remain ambiguous and were not sufficiently clarified within the study.

      We need to add more explanation for the global abundance analysis

      “To statistically test whether there was an effect of abundance and occupancy at the macro-scale, we used phylogenetic comparative analysis.  This analysis also addresses the issue of positive interspecific AORs potentially arising from not accounting for phylogenetic relatedness among species examined ”

      (6) The use of large N checklists could skew the perceived rarity or commonality of species, potentially diminishing the positive correlation observed in AORs. A consistent observer effect could lead to a near-zero effect with high precision.

      Regardless of the number of N species in checklists (seen in Fig 2), correlations are distributed around zero. This means there is nothing special about large N checklists. 

      (7) The study should acknowledge and discuss any discrepancies or deviations from previous literature or expected outcomes.

      We felt we had already done this as we discussed the previous meta-analysis and what we expected from this meta-analysis.  Nevertheless, we have added some relevant sentences in the new version of MS.

      In addition to these major points, there are several minor concerns:

      (1) Figure 2B lacks discussion, and the metric for the number of observations is not clarified. Furthermore, the labeling of the y-axis appears to be incorrect.

      Thank you very much for pointing out this shortcoming. Now, the y-axis label has been fixed and we mention 2B in the main text.

      (2) The study should provide a clear, mathematical expression of the multilevel random effect models for greater transparency.

      Many thanks for this point, and now we have added relevant mathematical expressions in Table S6.

      (3) On Line 260, the term "number of species" should be refined to "number of species in a checklist," ideally represented by a formula for precision.

      This ambiguity has been mended as suggested.

      Please provide the data and R code linked to the outputs.

      The referee must have missed the link (https://github.com/itchyshin/AORs) in our original MS. In addition to our GitHub repository link, we now have added a link to our Zenodo repository (https://doi.org/10.5281/zenodo.14019900).

      Reviewer #3 (Recommendations For The Authors):

      The authors cite Rabinowitz's 7 forms of rarity paper as a suggestion that previous findings also break the AOR. In fact empirical studies of the 7 forms of rarity typically find that all three forms of rareness vs commonness are heavily correlated (e.g. Yu & Dobson 2000).

      We thank the reviewer for drawing attention to Yu & Dobson (2000) and similar studies that find positive correlations among the axes of rarity. Ref 3 is correct in that Rabinowitz’s (1981) framework does not require that local abundance and geographic range size be uncorrelated for every species; instead, it highlights conceptual scenarios where a species may be common locally yet have a restricted distribution (or vice versa).

      Empirical analyses such as Yu & Dobson (2000) show that, on average, these axes can be correlated, which may align with conventional AOR findings in some taxonomic groups. However, Rabinowitz’s key insight was that exceptions do occur, so these exceptions demonstrate that strong positive AORs may not be universally applicable. Our results do not claim that Rabinowitz’s framework “breaks” the AOR outright; instead, we use it to underscore that local abundance can, in principle, be “decoupled” from global occupancy.  Whether the correlation found by Yu & Dobson (2000) implies a positive AOR, requires a detailed simulation study, which is an interesting avenue for future research. 

      Thus, citing Rabinowitz serves to highlight the potential heterogeneity and complexity of abundance–occupancy relationships rather than to refute every positive correlation reported in the literature. Our findings suggest that when examined at large spatiotemporal scales (with unbiased sampling), the overall AOR signal may be less robust than traditionally believed. This is consistent with Rabinowitz’s view that local abundance and global range can vary along independent axes. Now we added

      “Although studies using her framework found positive correlations between species range and local abundance.”

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      Summary:

      This manuscript uses a well-validated behavioral estimation task to investigate how optimistic belief updating was attenuated during the 2020 global pandemic. Online participants recruited during and outside of the pandemic estimated how likely different negative life events were to happen to them in the future and were given statistics about these events happening. Belief updating (measured as the degree to which estimations changed after viewing the statistics) was less optimistically biased during the pandemic (compared to outside of it). This resulted from reduced updating from "good news" (better than expected information). Computational models were used to try to unpack how statistics were integrated and used to revise beliefs. Two families of models were compared - an RL set of models where "estimation errors" (analogous to prediction errors in classic RL models) predict belief change and a Bayesian set of models where an implied likelihood ratio was calculated (derived from participants estimations of their own risk and estimation of the base rate risk) and used to predict belief change. The authors found evidence that the former set of models accounted for updating better outside of the pandemic, but the latter accounted for updating during the pandemic. In addition, the RL model provides evidence that learning was asymmetrically positively biased outside of the pandemic but symmetric during it (as a result of reduced learning rates from good news estimation errors).

      Strengths:

      Understanding whether biases in learning are fixed modes of information processing or flexible and adapt in response to environmental shocks (like a global pandemic or economic recession) is an important area of research relevant to a wide range of fields, including cognitive psychology, behavioral economics, and computational psychiatry. The study uses a well-validated task, and the authors conduct a power analysis to show that the sample sizes are appropriate. Furthermore, the authors test that their results hold in both a between-group analysis (the focus of the main paper) and a within-group analysis (mainly in the supplemental).

      The finding that optimistic biases are reduced in response to acute stress, perceived threat, and depression has been shown before using this task both in the lab (social stress manipulation), in the real world (firefighters on duty), and clinical groups (patients with depression). However, the work does extend these findings here in important ways:

      (1) Examining the effect of a new real-world adverse event (the pandemic).<br /> (2) The reduction in optimistic updating here arises due to reduced updating from positive information (previously, in the case of environmental threat, this reduction mainly arose from increased sensitivity to negative information).<br /> (3) Leveraging new RL-inspired computational approaches, demonstrating that the bias - and its attenuation - can be captured using trial-by-trial computational modeling with separate learning rates for positive and negative estimation errors.

      Weaknesses:

      Some interpretation and analysis (the computational modeling in particular) could be improved.

      On the interpretation side, while the pandemic was an adverse experience and stressful for many people (including myself), the absence of any measures of stress/threat levels limits the conclusions one can draw. Past work that has used this task to examine belief updating in response to adverse environmental events took physiological (e.g., SCR, cortisol) and/or self-report (questionnaires) measures of mood. In SI Table 1, the authors possibly had some questionnaire measures along these lines, but this might be for the participants tested during the pandemic.

      Thank you for this review.

      We agree that the lack of physiological and self-report measures of stress, threat, and perceived uncertainty limits the interpretation of findings regarding potential psychological factors. Some self-reported anxiety and perceived risk measures experienced during the lockdowns were collected in a subset of participants (n=40, counting n=21 tested before and during the 1st strict lockdown, and n=19 tested solely during the 1st lockdown). These reports were given retrospectively at the time of release of the 1st lockdown in summer 2020 when the pandemic was still unfolding (SI Table 1).

      Exploratory correlations revealed some noteworthy trends. We found that participants who reported to have perceived a bigger risk of death due to contagion were also those who were less optimistically biased when updating their beliefs about adverse future life risks during the first strict COVID-19-related lockdown (Pearson’s r = -0.36, p = 0.02).

      Moreover, parameter estimates from the computational models of belief updating showed associations with specific survey responses: The rational Bayesian model’s scaling parameter correlated positively with adherence to distancing measures (r = 0.41, p = 0.01) and negatively with the need for social contact (r = -0.37, p = 0.02). This result indicated that participants who were updating their beliefs faster were more likely to follow preventive guidelines and felt less social craving. Meanwhile, the asymmetry parameter correlated negatively with mask wearing (r = -0.41, p = 0.01), positively with physical contact with close others (r = 0.32, p = 0.04) and satisfaction with social interactions (r = 0.33, p = 0.04). This suggests that participants who displayed some asymmetry in belief updating during the COVID-19 pandemic were less likely to comply with mask-wearing rules and more likely to engage in social interactions.

      However, these results did not survive correction for multiple comparisons and the sample size for correlational analyses is in the lower range. The subjective measures of anxiety and fear of contagion did not significantly correlate to the updating bias, or any other variable measured by the belief updating task (e.g. estimation error, updating magnitude).

      We now further discuss on page 12 the limitation, which reads:

      “We did not collect physiological measures of stress or information about the COVID-19 infection status of participants, which precludes a direct exploration of the immediate effects of experiencing the infection on belief-updating behavior and the potential interaction with anxiety and stress levels. Although subjective ratings of the perceived risk of death from COVID-19 correlated negatively to the beliefs updating bias measured during the pandemic, this result was obtained retrospectively in a subset of participants (SI section 4). We thus cannot directly attribute the observed lack of optimistically biased belief updating during the lockdown to psychological causes such as heightened anxiety and stress. This limitation is noteworthy, as the impact of experiencing the pandemic on belief updating about the future could differ between those who directly experienced infection and those who remained uninfected. It is also important to acknowledge that our study was timely and geographically limited to the context of the COVID-19 outbreak in France. Cultural variations and differences in governmental responses to contain the spread of SARS-CoV-2 may have impacted the optimism biases in belief updating differently.”

      On the analysis side, it was unclear what the motivation was for the different sets of models tested. Both families of models test asymmetric vs symmetric learning (which is the main question here) and have similar parameters (scaling and asymmetry parameters) to quantify these different aspects of the learning process. Conceptually, the different behavioral patterns one could expect from the two families of models needed to be clarified.

      Thank you for raising this point. We agree that a clearer conceptual distinction between the two model families can help strengthen the interpretation of our findings. We have added the following considerations to the introduction on pages 2–3, which now reads:

      “The underlying mechanism of optimistically biased belief updating involves an asymmetry in learning from positive and negative belief-disconfirming information[2,3,4], which can unfold in two ways following Reinforcement learning (RL) or Bayes rule[5].

      Conceptually, Reinforcement learning (RL) and Bayesian models of belief updating are complementary but make different assumptions about the hidden process humans may use to adjust their beliefs when faced with information that contradicts them. The RL models assume belief updating is proportional to the estimation error. The key idea of the estimation error expresses the difference between how much someone believes they will experience a future life event and the actual prevalence of the event in the general population. This difference can be positive or negative. A scaling and an asymmetry parameter quantify the propensity to consider the estimation error magnitude and its valence, respectively. These two free parameters form the learning rate, which indicates how fast and biased participants update their beliefs.

      In contrast, Bayesian models assume that following Bayes’ rule the posterior, updated belief is a new hypothesis, formed by pondering prior knowledge with new evidence. The prior knowledge consists in information about the prevalence of life events in the general population. The new evidence comprises various alternative hypotheses. It examines how likely a specific event is to occur or not occur for oneself, compared to the likelihood that it will happen or not happen to others. This probabilistic adjustment of beliefs about future life events can be considered as an approximation of a participant’s confidence in the future. The two free parameters of the Bayesian belief updating model scale how much the initial belief deviates from the updated, posterior belief (i.e., scaling parameter) and the propensity to consider the valence of this deviance (i.e., asymmetry parameter).

      Although RL-like and Bayesian updating models make different assumptions about the updating strategy, they are complementary and powerful formalizations of human reasoning. Both models provide insight into hidden, latent variables of the updating process. Most notably, the learning rate and its components, the scaling and asymmetry parameters, which can vary between individuals and contexts and, through this variance, offer possible explanations for the idiosyncrasy in belief-updating behavior and its cognitive biases. “

      Do the "winning" models produce the main behavioral patterns in Figure 1, and are they in some way uniquely able to do so, for instance? How would updating look different for an optimistic RL learner versus an optimistic Bayesian RL learner?

      We now show that the winning models can reproduce the main behavioral patterns (revised Figure 1b).

      Moreover, we plotted estimated and observed average belief updating for each participant (n=123) using the overall best-fitting asymmetrical RL-like updating model shown in SI Figure 6.

      Would the asymmetry parameter in the former be correlated with the asymmetry parameter in the latter? Moreover, crucially, would one be able to reliably distinguish the models from one another under the model estimation and selection criteria that the authors have used here (presenting robust model recovery could help to show this)?

      The asymmetry parameter estimated with the optimistically biased RL- and Bayesian models did correlate (r = 0.735; p < 0.001).

      However, we argue that while the observed updating behavior and estimated free parameters are similar for RL-like and Bayesian learners, the underlying assumed cognitive processes differed and are identifiable. To test this assumption, we have added a model recovery analysis now reported in the supplement section 2c and main manuscript’s methods section pages 24–25.

      As shown in SI Figure 5 confusion matrix, there is evidence for strong recovery of nearly all models, and importantly for the two winning models: the optimistically biased RL-like model and the rational Bayesian model of belief updating. This analysis thus rules out that the two model families were confused and mitigate concerns about the validity of the model selection.

      Note, one exception was observed. The RL-like and Bayesian updating models that assumed no scaling and asymmetry were best recovered by their respective models that estimated the asymmetry parameter. Many factors could explain this. For example, it could be that the models, which assumed asymmetry, but no scaling, may have captured some bias in updating due to noise generated by the zero parameter models.

      A justification is also needed to focus on the "RL-like updating model with an asymmetry and scaling learning rate component" in Figure 3. As I understand it, this model fits best outside of the pandemic, but another model - the Rational Bayesian Model - does worse (and does the best during the pandemic). What model best combines the groups (outside and inside the pandemic)?

      We thank the reviewer for highlighting the need to justify our focus on the biased RL-like updating model in Figure 3. The model chosen for parameter comparison was selected based on a model comparison procedure conducted across all 12 models, including data from all participants (both those tested outside and during the pandemic, n=123). This model comparison revealed that Model 1 — the RL model with both asymmetry and scaling learning rate parameters estimated — provided the best fit across the entire dataset (Ef = 0.40, pxp = 0.99). As such, we focused on this model for parameter comparisons in Figure 3 to ensure consistency with the model comparison results and to interpret the parameters in the context of the overall best-fitting model. We added this information on top of the model parameter comparison results on page 8. Moreover, SI Figure 6 in the supplements shows how this model reproduces the observed belief updating in each of the 123 participants.

      Why do the authors use absolute belief updating (|UPD|) in the first linear mixed effects model (equation iv)? Since an update is calculated differently depending on whether information calls for an update in an upward or downward direction, I do not understand the need to do this (and it means that updates that go in the wrong direction - away from the information - are counted as positive)

      Thank you for driving our attention to this point. The ‘absolute belief updating’ note was incorrect, and we apologize for the confusion. To be precise, we did not use absolute updating values in our analyses. Belief updating was assumed on each trial to go either toward the base rate (e.g., Update = E2 – E1) for negative estimation errors or away from it for positive estimation errors (e.g., Update = E1 – E2). Updates that went in the wrong direction, further away from the base rate, were thus counted and included in the analysis with their negative sign. We have corrected this important point in equation iv of the methods section on page 19.

      Figure 4: The task schema does not show a confidence rating for base rates.

      Thank you for catching this. We have now added the confidence ratings for base rates to the task in Figure 4b in the revised version of the manuscript. We have furthermore corrected a typo in Figure 4a: The sample size for the group 3 tested in Mai 2021 now indicates 31.

      The authors report that base rates are uniformly distributed - this is quite different to other instances of the task where base rates are normally distributed (ideally around the midpoint of the scale). Why this deviation in the design?

      We used life events and base rates like those used in past studies of belief updating (Garrett and Sharot 2017, Sharot et al. 2011, Garrett et al. 2017, Korn et al. 2017), which were normal to uniformly distributed (W = 0.952, p = 0.088, Shapiro-Wilk test). The base rates ranged between 10% and 70%, with a mean of 40%. Participants rated their estimates between 3% and 77%, which ensured that for most likely (base rate = 70%) and most unlikely events (base rate = 10%) there was the same space (7%) to update beliefs toward the base rates. Moreover, all statistical models included the absolute estimation errors as a control for variance potentially explained by different estimation error magnitude[42,43]. We added this extra base rate information to the methods section’s task description on page 16.

      The task is comprised of only negative life events, which arguably this hinders the generalizability of the results. The authors could mention this as a limitation (there has been a significant quantity of debate about this point in relation to this task: see the work from Ulrike Hahn's lab).

      We have added a paragraph to the discussion page 13 to provide a rationale for using only adverse events. This paragraph now reads:

      “In this study we tested how actual adverse experiences affect the updating of negative future outlooks in healthy participants and in analogy to studies conducted in depressed patients[19,20,24] following the cognitive model of depression[37]. One open question is whether findings were specific to the adverse event framing[38,39,40]. We argue that under normal, non-adverse contexts belief updating should also be optimistically biased for positive life events, as shown by previous research[41,42]. However, how context such as experiencing a challenging or favorable situation influence the updating of beliefs about positive and negative outlooks remains an open question.”

      It would be useful to show the parameter recovery for all parameters (not just the learning rates) and the correlation between parameters (both in simulations and in the fitted parameters).

      We apologize for being unclear on this part. The models included two free parameters that were the components of the learning rates: The scaling and the asymmetry parameter. We now have added parameter recovery analyses for the scaling and asymmetry components of the learning rates for (1) the Bayesian model of belief updating during the pandemic, and (2) the RL-like model of belief updating outside the pandemic to the supplement (SI section 2b, SI Figure 4).

      Reviewer #2:

      The authors investigated how experiencing the COVID-19 pandemic affected optimism bias in updating beliefs about the future. They ran a between-subjects design testing for participants on cognitive tasks before, during, and after lifting the sanitary state of emergence during the pandemic. The authors show that optimism bias varied depending on the context in which it was tested. Namely, it disappeared during COVID-19 and re-emerged at the time of lift of sanitary emergency measures. Through advanced computational modeling, they are able to thoroughly characterize the nature of such alternations, pinpointing specific mechanisms underlying the lack of optimistic bias during the pandemic.

      Strengths pertain to the comprehensive assessment of the results via computational modeling and from a theoretical point of view to the notion that environmental factors can affect cognition. However, the relatively small sample size for each group is a limitation.

      Thank you for this review.

      We acknowledge that sample sizes in each group are lower, especially when breaking down the participant sample into four sub-samples tested in the different contexts. To mitigate concerns we checked the power of the observed context by valence interaction on belief updating. To this aim we simulated new belief updates using the parameters from the best fitting optimistic RL-like model of observed belief updating outside the pandemic, and the rational Bayesian model of observed belief updating during the pandemic. At each iteration we performed a linear mixed effects model analysis of the simulated belief updates[44] analogous to equation iv in the main text. The frequency across 1000 iterations with which the LMEs detected a significant interaction of valence by context on simulated belief updating was 75 %. This frequency indicates the power of the valence by context interaction on observed belief updating. In other words, false negatives were 25% likely, which meant type II errors of failing to reject the null hypothesis when the effect was there. We have added these extra analyses to the main manuscript’s results section page 4 and method’s section page 20.

      A major impediment interpreting of the findings is the need for additional measures. While the information on for example, risk perception or the need for social interaction was collected from participants during the pandemic, the fact that these could not be included in the analysis hinders the interpretation of findings, which is now generally based on data collected during the pandemic, for example, reporting increased stress. While authors suggest an interpretation in terms of uncertainty of real-life conditions it is currently difficult to know if that factor drove the effect. Many concurrent elements might have accounted for the findings. This limits understanding of the underlying mechanisms related to changes in optimism bias.

      We agree with the reviewer on the limitation arising from the lack of physiological and self-report measures of stress, threat, and perceived uncertainty. To address this point and a similar point raised by reviewer 1 we have added a section to the supplement (SI section 4) that now reports explorative correlations between questionnaire responses of subjective perceptions of risk and anxiety, behavior (e.g. mask wearing, social distancing) and belief updating measured during the 1st strict lockdown.

      We now also further discuss this limitation on page 12 of the main text’s discussion.

      I recommend that the authors spend more time on explaining the belief-updating task in the presentation of the experiment.

      Thank you for this advice. We now provide a clearer and more detailed description of the belief-updating task in the main manuscript’s methods section and have updated Figure 4b to display the confidence rating event in the task schema.

      The task description now reads:

      “As illustrated in Figure 4b, each of the 40 trials began with presenting an adverse life event. Participants estimated their own risk and the risk of someone else their age and gender. Then the base rate of the event occurring in the general population was displayed on the computer screen. Participants rated their confidence in the accuracy of the presented base rate. Finally, they re-estimated their risk for experiencing the event now informed by the base rate.”

      The experimental task seems to include a self-other dimension, which is completely disregarded in the analysis. It would be interesting to explore whether the effect of diluted optimism bias during the pandemic is specific to information about self vs. Other.

      We appreciate the reviewer's observation regarding the self-versus-other dimension in the belief updating task design. As now shown in SI Figure 2 the participants indeed displayed an optimism bias: They estimated that adverse events are more likely to happen to others than to themselves (ß = 3.02, SE = 0.86, t (232) = 3.53, p = 5.09e-04, 95% CI [1.33 – 4.71]; SI Figure 2; SI Table 18). This effect was observed overall participants. The pandemic context had no significant effect (ß = -1.91, SE = 3.00, t (232) = -0.64, p = 0.52, 95% CI [-7.82 – 4.00]; SI Table 18). Moreover, following previous studies of optimistically biased belief updating we tested the effect of estimation errors (EE) calculated on the difference between the estimate for someone else (eBR) and the base rate (BR), following: EE = eBR – BR[4,5,25,26]. When categorizing trials as good news or bad news based on this alternative EE calculation the context-by-EE valence interaction remained significant (SI Table 6).

      We conclude from these additional analyses that experiencing the pandemic specifically influenced belief updating but did not affect optimism biases in initial beliefs about the future.

      Please provide an English translation of the instructions for the task.

      We now provide an English translation of the task instructions in the Supplement section 5.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      (1) The production of ROS has been measured in a very superficial way.

      The term "ROS" confers a plethora of chemical species which exerts different physiological effects on different cells and situations.

      Mitochondria through one of the source, but not the only source of ROS production. Only measuring ROS with mitosox do not reflect the cellular condition of ROS in a specific condition. I would suggest authors consider doing IF of oxidative stress specific markers , carbonyl group and also, maybe, Amplex red for determining average oxidative stress and ros production in the cells.

      We agree with the reviewer that a detailed analysis of ROS production and its markers would strengthen the manuscript. Accordingly, we will perform the Amplex Red assay for Figure 1.

      (2) 8-OHG signal seems very confusing in Figure 7E. 8-ohg is supposed to be mainly in the nucleus and to some extent in mitochondria. The signal is very diffused in the images. I would suggest a higher magnification and better resolution images for 8-ohg. Also, the VWF signal is pretty weak whereas it should be strong given the staining is in aorta. Authors should redo the experiments.

      The reviewer’s comment is correct regarding the expected signal. We will repeat the assays. However, we would like to note that the flat morphology of the endothelial cell monolayer on the aortic surface may limit the visualization of subcellular signal differentiation when transversely sectioned.

      (3) PCA analysis is quite not clear. Why is there a convergence among the plots? Authors should explain. Also, I would suggest that the authors do the analysis done in Figure 8B again with R based packages. IPA, though being user-friendly, mostly does not yield meaningful results and the statistics carried out is not accurate. Authors should redo the analysis in R or Python whichever is suitable for them.

      Thank you for your valuable feedback. We acknowledge the concern regarding the PCA analysis and the convergence observed in the plots. In the revised manuscript, we will revise our interpretation to clarify this observation.

      Additionally, we appreciate your suggestion to use R-based packages for pathway analysis. We will make efforts to regenerate the analysis presented in Figure 8B using R to enhance the statistical robustness and reproducibility of our results.

      (4) The MS analysis part seems pretty vague in methods. Please rewrite.

      We will revise the methods section to improve the legibility.

      Reviewer #2 (Public review):

      All the experiments performed here are in overexpression background therefore, it would be crucial to show that p66Shc is SUMO2ylated at physiological levels.

      To address this concern, we will attempt to assess p66Shc-SUMO2 levels under physiological conditions. However, we would like to highlight a technical limitation: the currently available antibodies do not distinguish p66Shc from other isoforms, nor SUMO2 from SUMO3. Therefore, enriching for the endogenous p66Shc-SUMO2 adduct will require novel tools and techniques, which we are actively exploring.

      Reviewer #3 (Public review):

      One notable weakness is that the link between the observed cellular changes and the ultimate in vivo phenotype remains only partially explored. While the authors successfully show that p66ShcK81R knockin mice are protected from endothelial dysfunction in a hyperlipidemic context, additional experiments characterizing the broader tissue-specific roles, or examining further endothelial assays in vivo, would strengthen the mechanistic conclusions. It would also be beneficial to see more direct evaluations of p66Shc subcellular localization in the protective knockin mice to complement the proteomic findings.

      That is an excellent suggestion. We will determine the tissue specific distribution of endogenous p66ShcK81R.

      Despite these gaps, the data broadly support the authors' main conclusions. The authors lay out a plausible mechanistic pathway for how hyperlipidemia and increased global SUMOylation can converge on the oxidative stress pathway to provoke vascular dysfunction.

      The likely impact of this work on the field is noteworthy. Beyond clarifying how a single post-translational modification event can influence the pathophysiology of endothelial cells, the study provides a model for investigating broader roles of SUMO2 in other cardiovascular conditions and highlights the importance of identifying additional SUMOylation sites and their downstream impact.

      In conclusion, by demonstrating the direct SUMOylation of p66Shc at lysine-81 and linking that modification to endothelial dysfunction in a hyperlipidemic mouse model, this paper offers valuable insights into how broadly acting post-translational modifiers can evoke specific pathological effects.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript assesses the utility of spatial image correlation spectroscopy (ICS) for measuring physiological responses to DNA damage. ICS is a long-established (~1993) method similar to fluorescence correlation spectroscopy, for deriving information about the fluorophore density that underlies the intensity distributions of images. The authors first provide a technical but fairly accessible background to the theory of ICS, then compare it with traditional spot-counting methods for its ability to analyze the characteristics of γH2AX staining. Based on the degree of aggregation (DA) value, the authors then survey other markers of DNA damage and uncover some novel findings, such as that RPA aggregation inversely tracks the sensitivity to PARP inhibitors of different cell lines.

      The need for a more objective and standardized tool for analyzing DNA damage has long been felt in the field and the authors argue convincingly for this. The data in the manuscript are in general well-supported and of high quality, and show promise of being a robust alternative to traditional focus counting. However, there are a number of areas where I would suggest further controls and explanations to strengthen the authors' case for the robustness of their ICS method.

      Strengths:

      The spatial ICS method the authors describe and demonstrate is easy to perform and applicable to a wide variety of images. The DDR was well-chosen as an arena to showcase its utility due to its well-characterized dose-responsiveness and known variability between cell types. Their method should be readily useable by any cell biologist wanting to assess the degree of aggregation of fluorescent tags of interest.

      Weaknesses:

      The spatial ICS method, though of longstanding history, is not as intuitive or well-known as spot-based quantitation. While the Theory section gives a standard mathematical introduction, it is not as accessible as it could be. Additionally, the values of TNoP and DA shown in the Results are not discussed sufficiently with regard to their physical and physiological interpretation.

      We agree that a major limitation in adaption of this approach is a deeper understanding of the theory and results. We have updated the theory section to include further discussion (Page 4 line 132)

      The correlation of TNoP with γH2AX foci is high (Figure 2) and suggestive that the ICS method is suitable for measuring the strength of the DDR. The authors correctly mention that the number of spots found using traditional means can vary based on the parameters used for spot detection. They contrast this with their ICS detection method; however, the actual robustness of spatial ICS is not given equal consideration.

      We found it difficult to give equal consideration of robustness to ICS. The major limitation of traditional approaches is proper selection of an intensity threshold that is necessary to define and separate foci from background intensity. However, ICS does not employ a threshold, therefore we could not test different thresholding applications in ICS as we did with traditional methods. In our view the absence of the need for a threshold is profoundly advantageous. The only inputs we employ in the ICS analysis are used to segment cell nuclei, yet these have no impact on the ICS calculation and are necessary for any analysis of the DDR.

      Reviewer #2 (Public review):

      Summary:

      Immunostaining of chromatin-associated proteins and visualization of these factors through fluorescence microscopy is a powerful technique to study molecular processes such as DNA damage and repair, their timing, and their genetic dependencies. Nonetheless, it is well-established that this methodology (sometimes called "foci-ology") is subject to biases introduced during sample preparation, immunostaining, foci visualization, and scoring. This manuscript addresses several of the shortcomings associated with immunostaining by using image correlation spectroscopy (ICS) to quantify the recruitment of several DNA damage response-associated proteins following various types of DNA damage.

      The study compares automated foci counting and fluorescence intensity to image correlation spectroscopy degree of aggregation study the recruitment of DNA repair proteins to chromatin following DNA damage. After validating image correlation spectroscopy as a reliable method to visualize the recruitment of γH2AX to chromatin following DNA damage in two separate cell lines, the study demonstrates that this new method can also be used to quantify RPA1 and Rad51 recruitment to chromatin following DNA damage. The study further shows that RPA1 signal as measured by this method correlates with cell sensitivity to Olaparib, a widely-used PARP inhibitor.

      Strengths:

      Multiple proof-of-concept experiments demonstrate that using image correlation spectroscopy degree of aggregation is typically more sensitive than foci counting or foci intensity as a measure of recruitment of a protein of interest to a site of DNA damage. The sensitivity of the SKOV3 and OVCA429 cell lines to MMS and the PARP inhibitors Olaparib and Veliparib as measured by cell viability in response to increasing amounts of each compound is a valuable correlate to the image correlation spectroscopy degree of aggregation measurements.

      Weaknesses:

      The subjectivity of foci counting has been well-recognized in the DNA repair field, and thus foci counts are usually interpreted relative to a set of technical and biological controls and across a meaningful time period. As such:

      (1) A more detailed description of the numerous prior studies examining the immunostaining of proteins such as γH2AX, RAD51, and RPA is needed to give context to the findings presented herein.

      We apologize for not providing enough detail. We have added further references and discussion. γH2AX foci counting, in particular, has been used in thousands of previous studies. (Pages 18 line 513 and 517)

      (2) The benefits of adopting image correlation spectroscopy should be discussed in comparison to other methods, such as super-resolution microscopy, which may also offer enhanced sensitivity over traditional microscopy.

      Thank you for raising this point. We have added this discussion (page 19 line 553). The limiting factor that ICS addresses is the partition coefficient of signal in a foci or cluster versus outside the cluster. Super-resolution will not necessarily improve this unless it is resolved down to single molecule counting. However, one would still need to evaluate how to define a cluster or foci in the background of non-cluster distribution.

      (3) Additional controls demonstrating the specificity of their antibodies to detection of the proteins of interest should be added, or the appropriate citations validating these antibodies included.

      We have added text stating that we only use validated antibodies (page 6 line 193). One thing to note is that we are measuring differences between treatment conditions, thus, if an antibody has non-specific labeling of proteins of cellular structures that do not change upon treatment, our approach would overcome this limitation.

      Reviewer #3 (Public review):

      Summary:

      This paper described a new tool called "Image Correlation Spectroscopy; ICS) to detect clustering fluorescence signals such as foci in the nucleus (or any other cellular structures). The authors compared ICS DA (degree of aggregation) data with Imaris Spots data (and ImageJ Find Maxima data) and found a comparable result between the two analyses and that the ICS sometimes produced a better quantification than the Imaris. Moreover, the authors extended the application of ICS to detect cell-cycle stages by analyzing the DAPI image of cells. This is a useful tool without the subjective bias of researchers and provides novel quantitative values in cell biology.

      Strengths:

      The authors developed a new tool to detect and quantify the aggregates of immunofluorescent signals, which is a center of modern cell biology, such as the fields of DNA damage responses (DDR), including DNA repair. This new method could detect the "invisible" signal in cells without pre-extraction, which could prevent the effect of extracted materials on the pre-assembled ensembles, a target for the detection. This would be an alternative method for the quantification of fluorescent signals relative to conventional methods.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major comments:

      (1) The ICS theory section is essential and based on an excellent review from one of the authors. It would benefit greatly from a diagram showing where the quantities 𝒈(𝟎, 𝟎), 𝝎𝟎, and 𝒈inf come from in the 2D Gaussian fit, ideally for two cases where these quantities differ (i.e., how they correspond to different DA or TNoP values). In my opinion, this addition would greatly increase the manuscript's accessibility for DDR researchers. The citation of the review at the beginning would also be a plus.

      We have added the review citation at the front of the theory section (page 3 line 87).We have highlighted where g(0,0), the most critical measurement for determination of TNoP and DA, derives from in Figure 2D. However, it is difficult to describe all the curve fit parameters in an image as they have some interdependency on each other and thus labeling one in a single image would not independently capture how they might be observed in a different curve fit.

      (2) The TNoP measured in Figure 2 is a quantity about 2000-3000 times greater than the number of "traditionally detected" foci by both methods and the linear relations have very low Y intercepts. Can the authors comment explicitly on the physical interpretation of this number - are 2 to 3 thousand independent particles present within each "focus" detected by traditional means? If so, then what might one "particle" correspond to? (a single secondary antibody or fluorophore? a nucleosome?). In a similar vein, the X intercepts lie at around 25 foci, meaning that in images with fewer than that number of foci detected by ImageJ or Imaris, the ICS method should detect zero TNoP - is this in line with the authors' predictions? Is it possible that a first-order line fit is not the most appropriate relation between the two methods?

      We apologize for our brevity here. Since DA proved to be a more useful metric we did not spend much effort discussing TNoP. TNoP correlates to the number of clustered particles, or non-diffuse fluorophores. TNoP is the inverse of the number of individual particles per nucleus, but the value is not a direct measure of foci. If a sample had no clustering at all, the number of individual particles would be at a maximum and the TNoP would be at a minimum. However, as fluorophores cluster, the number of individual particles (i.e. non-clustered fluorophores) decreases, which increases the TNoP value. Therefore, TNoP has a correlation to the number of foci detected through traditional measurements, as we found here. Yet, TNoP is a relative measurement and cannot be compared across different conditions. Similar to foci counting, TNoP is unable to factor the size or intensity of each cluster, thus DA is a more appropriate quantification of the DNA damage response.

      The value of TNoP is dependent on the fitted point spread function and the area of the nucleus. The y=0 intercept of TNoP is defined by the optical setup and is not expected to necessarily go through x=0. Intriguingly, other groups have found that some foci identified through traditional measurements are actually clusters of multiple smaller foci, thus the concept of what a foci represents is difficult to interpret. Thus, here we aimed to show a general correlation of TNoP with foci count through traditional methods to reflect how ICS is similar to foci counting, then employed DA to overcome the limitations of defining a foci.

      We have tried to clarify this in the text (page 8, line 266)

      (3) Some suggestions to address the robustness of ICS:

      For a given sample (i.e. one segmented nucleus), the calculation of DA and TNoP should be similar between different images of that same nucleus taken at different times, similar to how the number of traditionally detected foci would be fairly invariant. In particular, it should be shown that these values are not just scaling with the higher normalized intensity seen in stronger DDR responses. In the same vein, the linear relationship between TNoP and "foci" should not change even if the confocal settings are slightly different (i.e., higher/lower illumination intensity) as long as the condition stipulated by the authors in the Discussion holds ("ICS can be implemented on any fluorescence image as long as the square relative fluorescence intensity fluctuations are detectable above noise fluctuations."). To show, as the title states, that spatial ICS is a robust tool, it would be desirable to demonstrate this with a series of images of the same cell at the same or varying excitation intensities.

      Thank you for your suggestions. Indeed, the calculation will be the same over sequential images of the same cell. Observations of dose dependent DA that does not correlate with intensity for RPA1 and RAD51 results (Fig. S5) directly demonstrates that DA does not just scale with intensity.

      We would not expect the TNoP to change with confocal setting, however we show in Figure 1 that the number of foci does indeed change with intensity settings as captured by thresholds. Therefore, any interpretation of TNoP vs. foci count would be very difficult to make at different microscope settings. To ensure we are fairly comparing ICS to existing analysis we keep the settings the same and measure changes between conditions.

      (4) More information is needed on how intensity normalization was performed. The Methods states "Measurements across experiments were normalized by the control in each dataset." The DMSO (0mM drug) plots all appear to have a mean of 1.0, so it appears the values for each set of control nuclei were divided by their own mean, and then the values for each set of experimental nuclei were divided by the mean value of all 3 controls as an aggregate; is this correct?

      We apologize for not being more clear. Thank you for raising this point. We normalized data to a control from each experimental group. Thus, in figures 3,4 and 5 data were collected over multiple experiments with one control per experiment and each treatment condition included in each experiment. Therefore, we normalized each result to the corresponding control from that imaging session. However, in Figure 8 we ran experiments at much higher throughput with multiple controls per experiment, thus the data were normalized to the overall average of the controls, which is why the control averages are not all at a value of 1. We have clarified this in the text. (Page 7 line 218).

      (5) Some more information about the ICS analysis should be given if the full code is not provided - in particular, how the nucleus mask was implemented on the "signal" channel (were the edges abruptly set to zero or was a window function introduced to avoid edge effects in the discrete FFT?

      Thank you for raising this point. We have added the code to GitHub - github.com/ dubachLab/ics. The signal region was established by simply applying the nuclear mask from the DAPI channel to the IF channel. Each region is padded with average intensity value at the edges for 2x the dimensions of the ROI to remove edge effects in the FFT.

      Minor comments:

      (1) Figure 3, 4, 5: I think it would aid figure readability if channels were labeled in the images themselves, not just in the legend.

      Thank you for the suggestion, we tried doing this and struggle to fit a label with the layout of the images. We were also concerned about interpretation of data in each column and the potential to assign data to each figure if they were so prominently labeled.

      (2) Supplemental Figures are mislabeled; the order given in the legends is S1, S2, S3, S2, S3. S4 is called out in the main text where it should be S5.

      Thank you for catching this error. We have made the necessary corrections. S4 contains data on cellular response to the drugs, while S5 contains intensity data in response to MMS.

      (3) It should be stated for each Figure what kind of microscopy was performed - I assume that it is confocal for everything except when widefield is explicitly stated, but for clarity please add this information.

      Indeed, this is correct, we have indicated which microscopy was used for each figure.

      (4) The MATLAB code and full (uncropped) Western blots should be provided as supplemental data if possible.

      We have included a GitHub link for the code and un-cropped western blots.

      (5) The p values from significance tests should indicate whether multiple comparisons correction was necessary (if suggested by Prism) and performed.

      Apologies for a lack of clarity but this was not necessary, significance was calculated vs. the next lower dose (e.g. 10 micromolar vs. 1 micromolar). We have clarified this in the methods (page 7 line 221).

      Reviewer #2 (Recommendations for the authors):

      Major points:

      In addition to the weaknesses noted above, to encourage widespread adoption of this method, the authors should make the tools that they used for their analysis publicly available. In a few instances (e.g., compare Figures 3J and 3L), other methods outperform DA. It would be meaningful to discuss when especially DA may be a better measure than others (such as intensity or number of foci).

      We have made code available on Github. We expect results, such as those in Figures 3J and 3L where intensity is significantly higher at the highest concentration but DA is not are reflective of the underlying biology and this may be interpreted differently under different experimental conditions. Imaris spots (Fig. 3K) also does not capture a significant increase at the highest dose of olaparib, suggesting that intensity may raise but it doesn’t not generate more foci. These results are likely highly dependent on the mechanism of olaparib at such a high concentration and the DDR response. We are hesitant to draw biological conclusions from these results and instead would like to highlight the capacity of ICS to evaluate the DDR, therefore we don’t want to make any broad comments about different applications.

      Minor points:

      (1) Pg. 12: "We used MMS to induce DNA damage in SKOV3 and OVCA429 cells. As expected, normalized intensity for RPA1 and RAD51 values (Figure S5) did not display a dose dependence on MMS concentration."

      Please provide a citation for the claim that RPA1 and RAD51 normalized intensities do not display a dose dependence on MMS concentration.

      These were data that we generated. We were not expecting an intensity change as that would presumably require increased protein generation in response to MMS, compared to gH2AX where the phospho-specific H2AX is generated in the DDR.

      (2) Pg. 12: "Similar to RPA1, RAD51 does not form distinguishable foci in the nuclei in cells without preextraction (Fig. 5)." Please provide a citation for this claim.

      We did not do pre-extraction and our results don’t produce changes in distinguishable foci. We provided citations discussing how, without pre extraction, foci formation for these proteins is not obvious (REF 38 and 39).

      (3) I noted that the authors cite one paper [38] apparently showing that RPA and Rad51 do not always form foci, however, this is in the C. elegans germline in response to micro irradiation, therefore I am not sure that it is applicable to human cells.

      We apologize for referencing a paper on C elegans. Most papers looking at RPA and RAD51 in the DDR use pre-extraction as it seems necessary to observe foci. Therefore, there are not as many papers, that we could find, that do not use pre-extraction. Reference 39 is in Hela cells.

      Reviewer #3 (Recommendations for the authors):

      Major points:

      (1) Page 8, the second paragraph: In the Result section, it is better to describe how the authors carried out immuno-staining (without pre-extract subtraction) and ICS briefly, although the method is described in detail in the Method section.

      Thank you for the suggestion, we have added this description (page 8, line 259)

      (2) In Figure 5K-P: The authors analyzed "invisible" RAD51 foci on the image (Fig. 5L, M, O, and P) without pre-extraction. As a control experiment, it is useful to check whether pre-extraction would provide "visible" RAD51 foci and to examine the similar MMS concentration dependency shown in Figure 5R (or 5T). This would strengthen the power of the ICS analysis.

      Thank you for the suggestion. In our hands, pre-extraction is extremely subjective. We have tried performing pre-extraction but find highly variable results depending on conditions. Therefore, we did not include any pre-extraction here. We expect that performing these experiments may or may not agree with results in Figure 5 largely because we are unable to achieve repeatable pre-extraction foci counting.

      (3) Figure 6D (and 6C) looks very interesting. It would be important to show the interpretation of this correlation shown in the graph. Although the authors argued that ICS analysis results shown in the graph could provide new insight into the DDR (page 14, last line 5), as shown in another part, it is important to carry out the same analysis by using Imaris Spots. Moreover, it is interesting to apply the analysis to RAD51 foci (shown in Figure 5), given that the PARPi effect is enhanced in the absence of RAD51mediated recombination.

      We completely agree that this analysis may generate interesting results to help interpret the DDR response to PARP inhibition. These experiments are part of an ongoing follow up study where we extend the use of ICS to other parts of the DDR and investigate protein clustering across several proteins with impact on PARPi response. Therefore, since the focus of this manuscript is introducing ICS as a tool to study the DDR, we believe that omitting those data here does not deter from the central points of the manuscript. We including results in Figure 6 because we wanted to show how ICS could impact DDR research. Furthermore, combined with our advances shown in Figures 7 and 8, we are currently working on adapting ICS to be high-throughput and much simpler than Imaris spots for handling large datasets needed to generate results like those in Figure 6.

      Minor points:

      (1) Figure 1I, blue arrows: These showed an area with a higher background. Because of a low magnification, it is very hard to see the difference from the other areas of the background. It is better to show a magnified image of the representative region with a higher background.

      We hope that readers can see the higher intensity in the diffuse area. We attempted to construct a zoomed in area, but that either blocked a significant portion of the nonzoomed image or added complexity to the figure. We have noted that images in Figure S1 are larger and more obviously capture an increase in background intensity.

      (2) Figure 2 legend, line 5, the same as "A)": This should be "B".

      Here, the number of independent particle clusters is intended to be the same as A, the difference is that the independent particles are clusters in C and individual fluorophores in A.

      (3) Page 9, the first paragraph, last line, foci formation, and foci composition: These should be "focus formation and focus composition".

      We have changed this.

      (4) Page 15, the first paragraph, line 5, palbociclib, camptothecin, or etoposide: please explain what kinds of the drugs are.

      We have added that these drugs cause cells to stall at different cell cycle stages. Explaining the drugs would take considerable room in the text.

      (5) Page 16, the first paragraph, line 1, bleomycin: Please explain what this drug is.

      Similar to above, we have stated that this drug causes DNA damage, going into detail would take several sentences.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Triple-negative breast cancer (TNBC) accounts for approximately 15-20% of all breast cancers. Compared to other types of breast cancer, TNBC exhibits highly aggressive clinical characteristics, a greater likelihood of metastasis, poorer clinical outcomes, and lower survival rates. Immunotherapy is an important treatment option for TNBC, but there is significant heterogeneity in treatment response. Therefore, it is crucial to accurately identify immunosuppressive patients before treatment and actively seek more effective therapeutic approaches for TNBC patients.

      Strengths:

      In this work, the authors collected and integrated data from single cells and large volumes of RNA sequencing and RNA-SEQ to analyze the TME landscape mediated by genes associated with iron death. On this basis, the prediction model of prognosis and treatment response of 131 patients was constructed using a machine learning algorithm, which is beneficial to provide individualized and precise treatment guidance for breast cancer patients.

      Thank you for your appreciation of our work. We are encouraged by your positive feedback and will continue to explore new avenues in personalized medicine for breast cancer.

      Weaknesses:

      However, there are still some issues that need to be clarified:

      (1) The description of the research background is too brief and concise, and it is necessary to add some information about the limitations of existing methods and the differences and advantages of this study compared with other published relevant studies, so as to better highlight the necessity and research value of this study.

      Thank you for your suggestions. We have supplemented the research background and compared the differences between this study and other studies, further highlighting the research value of our study.

      (2) This study is a retrospective analysis of a public data set and lacks experimental validation and prospective experiments to support the results of bioinformatics analysis. This should be added to the acknowledgment of limitations in the study.

      Thank you for the constructive feedback. We also acknowledge that the lack of experimental evidence is one of the limitations of this study. Therefore, we plan to conduct in vivo and in vitro experiments in our future research to support the findings of our bioinformatics analysis, and have already supplemented the relevant content in the limitations of Discussion.

      Reviewer #2 (Public review):

      Summary:

      This study aims to explore the ferroptosis-related immune landscape of TNBC through the integration of single-cell and bulk RNA sequencing data, followed by the development of a risk prediction model for prognosis and drug response. The authors identified key subpopulations of immune cells within the TME, particularly focusing on T cells and macrophages. Using machine learning algorithms, the authors constructed a ferroptosis-related gene risk score that accurately predicts survival and the potential response to specific drugs in TNBC patients.

      Strengths:

      The study identifies distinct subpopulations of T cells and macrophages with differential expression of ferroptosis-related genes. The clustering of these subpopulations and their correlation with patient prognosis is highly insightful, especially the identification of the TREM2+ and FOLR2+ macrophage subtypes, which are linked to either favorable or poor prognoses. The risk model thus holds potential not only for prognosis but also for guiding treatment selection in personalized oncology.

      Thank you for your thorough review and insightful comments.

      Weaknesses:

      The study has a relatively small sample size, with only 9 samples analyzed by scRNA-seq. Given the typically high heterogeneity of the tumor microenvironment (TME) in cancer patients, this may affect the accuracy of the conclusions. The scRNA-seq analysis focuses on the expression of ferroptosis-related genes in various cells within the TME. In contrast, bulk RNA sequencing uses data from tumor samples, and the results between the two analyses are not consistent. The bulk RNA sequencing results may not accurately capture the changes happening in the microenvironment.

      Thank you for your constructive feedback. Although this study only included 9 samples, given the limited availability of scRNA-seq datasets for untreated TNBC in public databases, we chose to utilize a dataset that contains a relatively larger number of untreated TNBC samples. We are fully aware of the complexity and high heterogeneity of the TME. Despite the limited sample size, we first conducted rigorous quality control on the data and, based on this, preliminarily revealed the landscape of the TME mediated by ferroptosis-related genes. These findings provide a new perspective for understanding the biological mechanisms underlying the onset and progression of breast cancer. To enhance the reliability and generalizability of our research results, we plan to strive to expand the sample size in future work and consider integrating other omics technologies, such as proteomics and metabolomics, with scRNA-seq data for a more in-depth exploration of the complex interactions within the TME.

      We also agree with your viewpoint that scRNA-seq data reveals gene expression within individual cells, while bulk RNA-seq data reveals the average gene expression in tumor tissues, and there are differences in data acquisition and processing methods between the two. However, we believe that there are also some close connections between them in terms of gene expression levels. By comparing the expression specificity of marker genes for specific cell types in breast cancer tissues, we found that they are correlated with patient prognosis, and the results have been validated in both internal and external validation sets. Thank you once again for your valuable suggestions, which will play an important guiding role in our subsequent research.

      Reviewer #1 (Recommendations for the authors):

      (1) The breast cancer scRNA-seq dataset files of GSE176078 include 10 TNBC primary tumors (DOI:10.1016/j.compbiomed.2023.107066). However, in this study, only 9 cases were listed, please explain the reason for the data exclusion.

      Thank you for your questions. Although it was clearly stated in the original paper that "To elucidate the cellular architecture of breast cancers, we analyzed 26 primary pre-treatment tumors, including 11 ER+, 5 HER2+ and 10 TNBCs, by scRNA-Seq (Supplementary Table 1)," upon downloading and carefully examining the patient information in Supplementary Table 1, we only included 9 patients explicitly labeled as TNBC in our study (https://pmc.ncbi.nlm.nih.gov/articles/PMC9044823/#SD1).

      (2) The description of the technique in the methods section should be more detailed, such as parameter settings, quality control standards, etc.

      Thank you for your valuable suggestions. We have already supplemented the relevant content in the methods section.

      (3) Please check and correct formatting errors to improve readability, such as lines 176 and 177.

      We were really sorry for our careless mistakes. Thank you for your reminder. We have corrected the “Pseudotime analysis with scRNA-seq data helps to obtain an approximate landscape of gene expression dynamics” into “Pseudotime analysis of scRNA-seq snapshot data helps to provide an approximate landscape of gene expression dynamics”. And we have further checked and revised the formatting errors of the manuscript.

      Reviewer #2 (Recommendations for the authors):

      (1) In multiple sections of the paper, abbreviations are used without being defined when first mentioned.

      We were really sorry for our careless mistakes. Thank you for your reminder. We have already added definitions for the abbreviations in both the abstract and the main text.

      (2) The authors should analyze whether the transcription factors in Figure 2 are correlated with the expression of ferroptosis-related genes.

      Thank you for your valuable feedback. Some transcription factors in Figure 2 correlate with the expression of ferroptosis-related genes, which we have supplemented in the Discussion.

      (3) Figures 3d and 4e lack explanations for the axis values, and for Figure 4e, is the unit of the y-axis labeled "survival" in days?

      Thank you for your valuable feedback. We apologize for the lack of explanations for the axis values in Figures 3d and 4e and we have made revisions to both figures accordingly. We have noted that the unit "survival" on the y-axis of Figure 4e is in years, and we have already made the necessary supplement to clarify this. Thank you very much for your reminder.

      (4) The authors conducted their analysis using public databases but did not cite the original literature, nor did they discuss the similarities and differences between their findings and those in the original studies.

      Thank you for your valuable suggestions, and we deeply apologize for our carelessness. We have supplemented the original literature in the references and discussed the differences between this study and the original literature in the Discussion.

      (5) Some figures, particularly those involving heatmaps and t-SNE plots (e.g., Figures 1 and 3), present dense and complex data that may be challenging for readers to interpret. The heatmaps (Figure 1e-f and 3d) include many genes, but it is unclear how these genes were selected, and the scale of gene expression differences is difficult to interpret. Simplifying these figures by focusing on the most differentially expressed and clinically relevant genes (e.g., those with prognostic value) would improve readability.

      Thank you for your valuable suggestions. The t-SNE plots in Figures 1 and 3 primarily serve as a dimensionality reduction technique to visually present the clustering of multiple cells or samples based on gene expression, aiding readers in quickly identifying cell subpopulations. The heatmaps, on the other hand, are mainly used to showcase the differential expression of ferroptosis-related genes across different clinicopathological classifications and cell subpopulations, with varying shades of color helping readers quickly recognize gene expression differences among different cell subpopulations. The genes included in the heatmaps (Figures 1e-f and 3d) are sourced from the FerrDb website. We have uploaded the list of ferroptosis-related genes used in this study as Supplementary Table 1 and added the relevant steps in Method 2.3.

      (6) The study analyzes the expression of ferroptosis-related genes in different immune cells within the TME. The authors should discuss how these changes in gene expression may impact the function and behavior of immune cells.

      Thank you for your valuable feedback. We have supplemented the discussion with detailed effects of the main differential genes (FOLR2 and TREM2) on the tumor immune response.

      (7) The authors analyzed the expression of ferroptosis-related genes in immune cells using single-cell sequencing data. However, they subsequently applied the selected genes to perform a risk factor analysis in tumor cells. Is the expression and function of these genes the same in immune cells and tumor cells? This seems questionable.

      Thank you very much for your suggestion. We also believe that there may be differences in the expression and function of genes between immune cells and tumor cells. However, some genes may exhibit similarities in their expression and function in immune cells and tumor cells, especially within the tumor immune microenvironment, due to the complex and tight interactions between immune cells and tumor cells (as shown in Figures 1d and 2h), and their expression levels can be related to the onset, progression, and prognosis of tumors.

      (8) While the risk score model based on ferroptosis-related genes is promising, it lacks experimental validation, which weakens the strength of the conclusions. The authors should consider conducting in vitro or in vivo experiments. These functional studies would provide essential evidence to support the model's predictive capability.

      Thank you for the constructive feedback. We fully recognize the importance of conducting functional studies to substantiate the predictive capability of the model. Therefore, we plan to conduct in vitro and in vivo experiments in our future research to provide the necessary evidence and further validate the model's effectiveness.

      (9) The manuscript predicts sensitivity to 27 drugs based on the risk score, but it lacks mechanistic insight into why patients in the high-risk group might be more responsive to certain drugs. Including a more detailed discussion of the molecular mechanisms underlying this drug sensitivity, particularly linking ferroptosis-related genes to drug metabolism or efficacy, would provide a stronger rationale for the clinical application of these findings.

      Thank you very much for your valuable suggestions. In the discussion, we thoroughly analyzed the mechanism of action of the drugs (ABT-263 and erlotinib) with the greatest difference in sensitivity between high-risk and low-risk groups, as well as their correlation with ferroptosis.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this revised report, Yamanaka and colleagues investigate a proposed mechanism by which testosterone modulates seminal plasma metabolites in mice. The authors identify oleic acid as a particularly important metabolite, derived from seminal vesicle epithelium, that stimulates linear progressive motility in isolated cauda epidydimal sperm in vitro. The authors provide additional experimental evidence of a testosterone dependent mechanism of oleic acid production by the seminal vesicle epithelium.

      Strengths:

      Often, reported epidydimal sperm from mice have lower percent progressive motility compared with sperm retrieved from the uterus or by comparison with human ejaculated sperm. The findings in this report may improve in vitro conditions to overcome this problem, as well as add important physiological context to the role of reproductive tract glandular secretions in modulating sperm behaviors. The strongest observations are related to the sensitivity of seminal vesicle epithelial cells to testosterone. The revisions include addition of methodological detail, modified language to reflect the nuance of some of the measurements, as well as re-performed experiments with more appropriate control groups. The findings are likely to be of general interest to the field by providing context for follow-on studies regarding the relationship between fatty acid beta oxidation and sperm motility pattern.

      Thank you for summarizing and your positive evaluation of our study.

      Weaknesses:

      Support for the proposed mechanism is stronger in this revised report than in the previous report, but there are many challenges in measuring sperm metabolism and its direct relationship with motility patterns. This study is no exception and largely relies on correlations between various experiments in lieu of direct testing. Additionally, the discussion is framed from a human pre-clinical perspective, and it should be noted that the reproductive physiology between mice and humans is very different.

      Thank you for pointing out the challenges in our paper. We appreciate your comment on the limited evidence supporting the direct relationship between sperm metabolism and motility patterns under current experimental conditions. Based on your and reviewer2’s suggestions, we have decided to remove the experiments and discussion on the “effects of OA on sperm metabolism, motility and fertility (Fig. 7, Supplemental Figure 5A and C-F.)” and the corresponding parts in the Discussion section from the paper. (See also Reviewer 2's main comment) These data mainly show correlations, and did not show direct evidence of causality. Instead, we added a new experiment to the manuscript, in which a lipid mixture that mimics the fatty acid profile secreted testosterone-dependently from seminal vesicle epithelial cells was added to the sperm culture medium (New Supplemental Figure 5, Lines 259-268). In this experiment, motility parameters were measured using CASA. This experiment evaluates the direct effects of lipid exposure on sperm motility. With these revisions, we are able to focus on the metabolic changes caused by testosterone in seminal vesicle epithelial cells, which are the central focus of our research. We have added a short statement agreeing the potential importance of OA and our intention to more rigorously investigate the role of OA in sperm function in subsequent studies (Lines 402-407).

      Furthermore, we have revised text, clearly state the limitations of the species difference and clarify that the translational aspects to humans are speculative (Lines 383-384, 395-397, 408-410).

      We appreciate your guidance. We believe that these changes will strengthen our research.

      Reviewer #2 (Public review):

      Using a combination of in vivo studies with testosterone-inhibited and aged mice with lower testosterone levels as well as isolated mouse and human seminal vesicle epithelial cells the authors show that testosterone induces an increase in glucose uptake. They find that testosterone induces a difference in gene expression with a focus on metabolic enzymes. Specifically, they identify increased expression of enzymes regulating cholesterol and fatty acid synthesis, leading to increased production of 18:1 oleic acid. The revised version strengthens the role of ACLY as the main regulator of seminal vesicle epithelial cell metabolic programming. 18:1 oleic acid is secreted by seminal vesicle epithelial cells and taken up by sperm, inducing an increase in mitochondrial respiration. The difference in sperm motility and in vivo fertilization in the presence of 18:1 oleic acid and the absence of testosterone, however, is small. Additional experiments should be included to further support that oleic acid positively affects sperm function.

      Thank you very much for carefully reading the manuscript and for your comments. We appreciate your understanding that the role of ACLY in metabolic programming of seminal vesicle epithelial cells has been strengthened in the revised version. On the other hand, we agree with your view that the increase in sperm motility and fertilization rate by oleic acid is minimal under the current experimental conditions. We agree that further evidence is needed to support our conclusion regarding the positive effects of oleic acid on sperm function. Based on your comments and our re-evaluation of the data, we have decided to remove the experiments and discussion on “OA and sperm motility” from the current paper (Fig. 7, Supplemental Figure 5A and C-F). In the revised paper, we have significantly toned down the claims on the previous role of oleic acid and instead focused on the metabolic regulatory mechanisms of seminal vesicle epithelial cells.

      We hope that these revisions address your concerns and improve the overall clarity of the manuscript.

      Recommendations for the authors:

      Note from the reviewing editor: The reviewers agree that the revised manuscript is significantly improved and view the work as important. Both reviewers agree that the evidence for testosterone effects on seminal vesicle epithelial cells to support fatty acid synthesis is strong and suggest that the authors tone down their conclusion of oleic acid effect on sperm motility as the effect is very small. With this minor changes, the evidence to support the conclusion of the study is viewed as solid.

      Thank you for recognizing the improvements that we have made to our manuscript and for appreciating the importance of our research. We also appreciate your assessment that the evidence for the effect of testosterone on seminal vesicle epithelial cells that support fatty acid synthesis is solid.

      On the other hand, we agree with the two reviewers that the effect of oleic acid on sperm motility is limited and that the relevant data do not measure a direct relationship. Therefore, we have decided to withdraw the data set on the effect of oleic acid on sperm (Fig. 7, Supplemental Figure 5A and C-F) and focus this paper on seminal vesicle epithelial cells (in response to reviewer 2's suggestion). Given that testosterone-induced lipid (Fatty acid) synthesis in seminal vesicle epithelial cells is a key aspect of our study, we have included additional experiments in the revised manuscript to show how lipids affect sperm (New Supplemental Figure5, Lines 259-263).

      With these revisions, the manuscript emphasizes the importance of testosterone-dependent fatty acid synthesis in seminal vesicle epithelial cells and the fact that this includes oleic acid. The title has also been partially revised in line with these revisions.

      Reviewer #1 (Recommendations for the authors):

      Minor Comments:

      (1) The authors indicate in the methods that extracellular flux analysis was normalized to cell count. However, the y-axis units in Figs 4, 8, 9 and SFig 9 are not normalized.

      (2) The OA label appears to be missing from Fig 7A. Additionally, the scale bar is offset in one of the images and the length of the scale bar does not appear to be mentioned in the figure legend.

      Thank you for raising these points. We have corrected.

      Fig. 7 has been withdrawn in response to Reviewer 2's suggestion.

      Reviewer #2 (Recommendations for the authors):

      With the experiments included in their revised version the authors strengthen their conclusions about testosterone-induced metabolic reprogramming in seminal vesicle cells resulting in reduced proliferation. The experiments surrounding ACLY are well-designed and give insights into the underlying molecular mechanisms. For other parts, the manuscript became less clear and it is often hard to follow the author's line of thoughts for their conclusions.

      Based on the experiments shown in the manuscript this reviewer is still not convinced that OA positively affects sperm function. The changes in linear motility are minor, blastocyst levels are lower and the authors do not show that OA alone positively affects cleavage rate during AI. Without additional experiments that show a stronger effect on sperm function, the authors should consider focusing the manuscript exclusively on seminal vesicle epithelial cells.

      Thank you for your constructive comments on our paper. We thank the reviewer for pointing out that the effect of oleic acid (OA) on sperm function is limited in our current experiments. As reviewer 1 also pointed out, we agree that further experiments and improved methodology are needed to reliably demonstrate the functional effects of OA on sperm. Because the strength of the data on the direct relationship between fatty acids in seminal fluid and improved sperm function is currently insufficient, we have removed the data set for oleic acid and sperm motility (Fig. 7, Supplemental Figure 5A and C-F) and focused on the “the mechanism of metabolic regulation of testosterone in seminal vesicle epithelial cells”. We have consistently narrowed the focus of the paper to the theme of “how testosterone changes energy metabolism in seminal vesicle epithelial cells”. In accordance with this change, the structure of the paper has also been partially revised (red text in the manuscript). With these revisions, the main point of the paper focuses on the mechanism by which testosterone regulates metabolic pathways in the seminal vesicle epithelial cells.

      For more detailed revisions, please see the responses to your comments below.

      (1) 45-55 still need major revision. It will not become clear to the reader what the authors mean by epididymal maturation. 'Ability to fertilize in in vitro?' Epididymal sperm are moving linearly in the absence of seminal vesicle fluid. Increased progressive motility, hyperactivation, and the ability to undergo the acrosome reaction are induced upon exposure to seminal vesicle fluid. The authors should introduce the concept of capacitation and that capacitation can be induced in vitro by exposure to bicarbonate and a cholesterol acceptor.

      Thank you for pointing out the ambiguity of epididymal maturation, the need to clarify the concept of capacitation, and the role of seminal plasma in this context. The revised text explains that epididymal maturation only gives sperm their potential ability to fertilize. It also explains that it is the subsequent capacitation process—inducible in vitro by incubation with bicarbonate and cholesterol acceptors—that gives full fertilization potential. On the other hands, we emphasize that in vivo, seminal plasma, which contains both capacitation-promoting and decapacitation factors, plays a key role in fine-tuning the timing of capacitation, ensuring that sperm acquire fertilization competence at the appropriate moment. We hope that these revisions clarify our intended meaning and strengthen the overall message of the paragraph. (lines 42-54)

      “Sperm that have completed spermatogenesis in the testis acquire their potential to fertilize while maturing in the epididymis (5–7). The physiological change of sperm during fertilization process are collectively referred to as “capacitation”. This change includes a large amplitude of flagella (called hyperactivation) and developing the capacity to undergo the acrosome reaction, and can be induced by culturing sperm collected from the epididymis in a medium containing bicarbonate and cholesterol acceptors (8, 9). However, once capacitation is complete, sperm cannot maintain that state for a long time. Therefore, even if epididymal sperm that have not been exposed to seminal plasma are artificially inseminated into the cervix or uterus, the fertilization rate remains low (10–12). That is because, in vivo, during ejaculation, exposure of epididymal sperm to seminal plasma masks the unintended capacitation as they pass through the female reproductive tract and ensures fertilization of sperm that reach the oviduct (13). In other words, seminal plasma plays an important role in fine-tuning the timing of sperm capacitation and in maintaining the sustained sperm motility needed to reach the oviduct.”

      (2) 81: Similar as in their rebuttal the authors should further elute on the connection between fructose, citrate, and testosterone. That still does not become clear. Based on the author's explanation in the rebuttal, why are citrate and fructose levels higher when the animals are castrated?

      We thank you for the opportunity to clarify our statement regarding the relationship between fructose, citrate, and testosterone. Our original explanation was intended to reflect the fact that testosterone from the testes has a stimulating effect on the accessory reproductive glands, and to report that the concentrations of fructose and citric acid were higher in the non-castrated (control) animals than in the castrated animals. In castrated animals, the absence of testosterone leads to decreased activity of these glands and, consequently, lower levels of these metabolites. To make this clear, we have revised the manuscript as follows. (lines 76-82)

      “Several specific factors produced by the male accessory glands that contribute to seminal plasma and impact male fertility have been elucidated. For example, surgical removal of seminal vesicles in male mice and rats was associated with infertility (17, 22, 23). The observations that fructose (24) and citric acid (25) concentrations in seminal plasma of control mice and rats are higher than in castrated animals suggest that the specific metabolism of the accessory glands might be affected by testosterone derived from the testes, which activate intracellular androgen receptors (AR; NR3C4) required for gene regulation of transcription.”

      (3) 111: This reviewer does not understand the author's obsession with reporting linear motility. Sperm are moving linearly when isolated from the epididymis. Again, increase of progressive motility is a well-defined hallmark of capacitation and primarily used in the field when discussing changes in sperm motility during capacitation. This reviewer is assuming that the changes in progressive vs linear motility in Fig. 7 are not significant because the data is more scattered. The % increase seems to be approximately the same. The same is true for Fig. 8. The increase in LIN is so small and not dose-dependent that this reviewer is not comfortable making that one of the main conclusions of the manuscript.

      Our claim is based on the observation that seminal vesicle secretions significantly improve the linear motility (VSL and LIN) of sperm even in an environment that does not contain capacitation-inducing factors such as BSA. We interpret this as a survival strategy for sperm to pass through the female reproductive tract efficiently. Therefore, we believe that this does not mean that the meaning of “progressive motility” in the context of conventional capacitation is the same as that of progressive motility observed in seminal plasma.

      However, the reviewer's point that the current data set does not sufficiently support what the minor increase in linear motility caused by oleic acid means is agreed with. Therefore, we have decided to withdraw the dataset on the effect of oleic acid on sperm motility (Fig. 7, Supplemental Figure 5A and C-F) and have revised the conclusion. (Lines 406-410)

      (4) 128: For the mitochondrial membrane potential measurements the authors should mention that they included antimycin as a control. The manuscript would benefit from including scatter plots with unloaded controls to support their gating strategy. In its current stage, the gating between low and high membrane potential seems arbitrary.

      Thank you for pointing this out. We have included an explanation of antimycin as a control in the main text (Lines 920-921). In addition, we have added some reference scatter plots and also added an explanation of the gating strategy between low and high membrane potentials (Supplemental Figure 1C and D, Lines 1101-1104). We hope this change will make the manuscript clearer.

      (5) 190: What do the authors mean by: 'However, there was no difference in the Oligomycin-sensitive ECAR, indicating that testosterone may increase glucose metabolism but does not enhance the expression of a group of enzymes involved in the glycolytic pathway.'

      Our original intention was to state that testosterone probably increases basal glycolytic flux via increased glucose uptake (as supported by the GLUT4 translocation data), but does not increase maximal glycolytic capacity, as indicated by the lack of difference in oligomycin-sensitive ECAR.

      However, as Reviewer 1 previously pointed out, we agree that the assay conditions themselves, such as the use of oligomycin to inhibit oxidative mitochondria, may create non-physiological conditions and not fully reflect the energy distribution in vivo. Under these conditions, there is a possibility that the flow of glycolysis will increase artificially as a compensatory reaction, and parameters such as “maximum glycolytic capacity” should have been interpreted with caution.

      Therefore, we have revised the manuscript to clarify that our data are a single-time point under defined experimental conditions and do not necessarily provide direct insight into changes in expression or activity of individual glycolytic enzymes.

      “These data indicate that testosterone enhances glucose utilization. This leads to the interpretation that testosterone increases the flow of glycolysis by increasing glucose uptake and alters metabolic flux distribution.” (Lines 186-188)

      (6) 205: Could the authors elaborate further on how they came to this conclusion: 'These results suggest that testosterone does not reduce transient enzyme activity in mitochondria but rather weakens the metabolic pathway of the mitochondrial TCA cycle and/or the electron transport chain due to the changes in gene expression patterns in seminal vesicle epithelial cells.' Based on their results at this point the authors have no insights about changes in enzyme activity or gene expression that might explain the phenotype.

      Our statement is based on the following observations. In testosterone-treated cells, the addition of glucose increased ECAR, suggesting an increase in glycolytic flux due to an increase in glucose uptake. On the other hand, mitochondrial respiratory parameters (basal respiration, oligomycin-sensitive respiration, FCCP-uncoupled respiration, and reserve respiratory capacity) were significantly decreased under testosterone treatment.

      From these results, it was speculated that testosterone promotes the redistribution of metabolic flux, directing it away from mitochondrial oxidative phosphorylation and towards the glycolytic pathway and, possibly, lipid synthesis. However, as the reviewers correctly point out, at this point, we have not directly measured changes in the activity or expression of individual enzymes in the TCA cycle or ETC. Therefore, in the next experiment, we extracted mRNA from the cells and performed gene expression analysis using real-time PCR. To make this clear, we have revised the manuscript as follows.

      “Overall, these data indicate that testosterone promotes the redistribution of metabolic flux. In other words, testosterone increased glycolysis in seminal vesicle epithelial cells while decreasing mitochondrial respiration. To determine whether these changes were accompanied by changes in gene expression of specific metabolic-related enzymes, we analyzed gene expression levels.” (Lines 201-205)

      (7) 219: Characterizing ACLY as an enzyme of the ETC is misleading. ACLY is a cytosolic enzyme that connects the TCA cycle with fatty acid synthesis.

      We would like to thank you for pointing out that the description of the function of ACLY could be misunderstood. We agree that characterizing ACLY as an enzyme of the ETC could be misleading. Therefore, we have revised the sentence to clearly indicate that ACLY is a cytosolic enzyme that links the TCA cycle with fatty acid synthesis. The revised text is as follows:

      "Interestingly, testosterone significantly increased the expression of Acly, which encodes a cytoplasmic enzyme that converts citrate transported from the TCA cycle into acetyl-CoA, a substrate that is essential for fatty acid synthesis." (lines216-218)

      (8) 228: Which results support that ETC proteins were upregulated by flutamide?

      We appreciate the reviewer for this point. In preliminary experiments, we analyzed ETC protein expression using real-time qPCR. Our data show that treatment with flutamide significantly upregulates the expression of genes involved in mitochondrial ETC, such as mtND6, while decreasing the expression of the lipogenic genes Acly and Acc. These additional data are now presented in Supplementary Figure S3B. (lines 223-226)

      (9) 245: Aren't the authors showing in Fig. 5 that glut4 expression is reduced in seminal vesicle epithelial cells upon testosterone treatment? How does that fit into the author's hypothesis?

      Thank you for pointing this out. We have already responded to a similar comment from Reviewer 3 in a previous revision. Please refer to our response to Reviewer 3 in a previous version.

      (10) 285: Based on the author's results OA increases the oocyte cleavage rate but then reduces the rate of blastocyst to cleaved oocyte. Doesn't that mean OA affects negatively early development?

      We thank the reviewer for the insightful comment. The one-hour pre-treatment is designed to reflect the transient exposure of sperm to the seminal plasma during ejaculation. In this context, it is unlikely that such a short exposure would impair the overall developmental potential of the embryo. However, although pre-conditioning with oleic acid does not ultimately affect the development of the offspring, it may lead to a decrease in the blastocyst rate at a certain point (approximately 96-120 hours after fertilization). We agree that additional research is needed to demonstrate this.

      Therefore, because the experiments related to the effects of oleic acid on sperm and fertilization are currently incomplete, we have decided to withdraw them for future research.

      (11) 305: What happens to pyruvate and lactate levels when ACLY expression is reduced?

      We appreciate the reviewer’s question regarding the fate of pyruvate and lactate when ACLY expression is reduced. In the absence of testosterone (Ctrl), the expression level of ACLY decreases. At this time, the concentration of pyruvate in the culture medium increased compared to that of testosterone (Testo; Fig. 4D,E). This is probably a reflection of the fact that when the expression of ACLY is suppressed, the rate at which the products of the glycolytic pathway are converted to the fat-producing pathway (i.e., the conversion of citrate to acetyl-CoA) decreases.

      On the other hand, lactate levels did not change significantly. This suggests that the flow of lactate production via lactate dehydrogenase is relatively constant, independent of metabolic reprogramming by ACLY.

      Therefore, our data suggest that a decrease in ACLY expression leads to a decrease in pyruvate demand, while lactate production is maintained. We interpret these findings as supporting the idea that ACLY is important for directing the carbon produced by the glycolytic pathway to lipid synthesis (by transporting citrate from the mitochondria).

      We hope that this explanation clarifies the interpretation of the data.

      Minor revision:

      189: ECAR: extracellular acidification rate. Please correct.

      We have corrected this. (Lines 184-185)

      199: Pyruvate is not synthesized, it is metabolized from PEP. Please correct.

      The following corrections have been made. “pyruvate is metabolized from phosphoenolpyruvic acid through glycolysis”. (Lines 194-195)

      In addition, minor revisions were made to improve the clarity of the overall text.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public reviews:

      Reviewer #1:

      (1) This manuscript introduces a useful curation pipeline of antibody-antigen structures downloaded from the PDB database. The antibody-antigen structures are presented in a new database called AACDB, alongside annotations that were either corrected from those present in the PDB database or added de-novo with a solid methodology. Sequences, structures, and annotations can be very easily downloaded from the AACDB website, speeding up the development of structure-based algorithms and analysis pipelines to characterize antibody-antigen interactions. However, AACDB is missing some key annotations that would greatly enhance its usefulness.

      Here are detailed comments regarding the three strengths above:

      I think potentially the most significant contribution of this database is the manual data curation to fix errors present in the PDB entries, by cross-referencing with the literature. However, as a reviewer, validating the extent and the impact of these corrections is hard, since the authors only provided a few anecdotal examples in their manuscript.

      I have personally verified some of the examples presented by the authors and found that SAbDab appears to fix the mistakes related to the misidentification of antibody chains, but not other annotations.

      (a) "the species of the antibody in 7WRL was incorrectly labeled as "SARS coronavirus B012" in both PDB and SabDab" → I have verified the mistake and fix, and that SAbDab does not fix is, just uses the pdb annotation.

      (b) "1NSN, the resolution should be 2.9 , but it was incorrectly labeled as 2.8" → I have verified the mistake and fix, and that sabdab does not fix it, just uses the PDB annotation.

      (c) "mislabeling of antibody chains as other proteins (e.g. in 3KS0, the light chain of B2B4 antibody was misnamed as heme domain of flavocytochrome b2)" → SAbDab fixes this as well in this case.

      (d) "misidentification of heavy chains as light chains (e.g. both two chains of antibody were labeled as light chain in 5EBW)" → SAbDab fixes this as well in this case.

      I personally believe the authors should make public the corrections made, and describe the procedures - if systematic - to identify and correct the mistakes. For example, what was the exact procedure (e.g. where were sequences found, how were the sequences aligned, etc.) to find mutations? Was the procedure run on every entry?

      We appreciate the reviewer’s valuable feedback. Our correction procedures combined manual curation with systematic sequence analysis. While most metadata discrepancies were resolved through cross-referencing original literature, we implemented a structured approach for identifying mutations in specific cases. For PDB entries labeled as variants (e.g., "Bevacizumab mutant" or "Ipilimumab variant Ipi.106") where the "Mutation(s)" field was annotated as "NO," we retrieved the canonical therapeutic antibody sequence from Thera-SAbDab, then performed pairwise sequence alignment against the PDB entry using BLAST program to identified mutated residues.

      This procedure was not applied to all entries, as mutations are context-dependent. Therapeutic antibodies have well-defined reference sequences, enabling systematic alignment. For antibodies lacking unambiguous wild-type references (e.g., research-grade or non-therapeutic antibodies), mutation annotations were directly inherited from the PDB or literature.

      All corrections have been publicly archived in AACDB. We have added a detailed discussion of this issue in the section “2.3 Metadata” of revised manuscript.

      (2) I believe the splitting of the pdb files is a valuable contribution as it standardizes the distribution of antibody-antigen complexes. Indeed, there is great heterogeneity in how many copies of the same structure are present in the structure uploaded to the PDB, generating potential artifacts for machine learning applications to pick up on. That being said, I have two thoughts both for the authors and the broader community. First, in the case of multiple antibodies binding to different epitopes on the same antigen, one should not ignore the potentially stabilizing effect that the binding of one antibody has on the complex, thereby enabling the binding of the second antibody. In general, I urge the community to think about what is the most appropriate spatial context to consider when modeling the stability of interactions from crystal structure data. Second, and in a similar vein, some antigens occur naturally as homomultimers - e.g. influenza hemagglutinin is a homotrimer. Therefore, to analyze the stability of a full-antigen-antibody structure, I believe it would be necessary to consider the full homo-trimer, whereas, in the current curation of AACDB with the proposed data splitting, only the monomers are present.

      We sincerely appreciate the reviewer’s insightful comments regarding the splitting of PDB files and we appreciate the opportunity to address the reviewer’s thoughtful concerns.

      Firstly, when two antibodies bind to distinct epitopes on the same antigen, we would like to clarify that this scenario can be divided into two cases based on the experimental context: Case1: When two antibodies bind to distinct epitopes on the same antigen, and their complexes are determined in separate structures. For example, SAR650984 (PDB: 4CMH) and daratumumab (PDB: 7DHA) target CD38 at non-overlapping epitopes. These two antibody-antigen complexes were determined independently, and their structures do not influence each other. Case 2 : When the crystal structure contains a ternary complex with two antibodies and an antigen, as in the example of 6OGE discussed in Section 2.2 of our manuscript. After reviewing the original literature, the experiment confirmed that the order of Fab binding does not affect the formation of the ternary complex, and the binding of one antibody does not enhance the binding of the other. This supports the rationale for splitting 6OGE into two separate structures. However, we acknowledge that not all ternary complexes in the PDB provide such detailed experimental descriptions in their original literature. We agree with the reviewer that in some cases, one antibody may stabilize the structure to facilitate the binding of a second antibody. For instance, in 3QUM, the 5D5A5 antibody stabilizes the structure, enabling the binding of the 5D3D11 antibody to human prostate-specific antigen. Such sandwich complexes are indeed valuable for identifying true epitopes and paratopes. Importantly, splitting the structure does not alter the interaction sites.

      Secondly, we fully agree with the reviewer that for antigens that naturally exist as homomultimers (e.g., influenza hemagglutinin as a homotrimer), the full multimeric structure should be considered when analyzing stability. In such cases, users can directly utilize the original PDB structures provided in their multimeric form. Our splitting approach is intended to provide an additional option for cases where monomeric analysis is sufficient or preferred, but it does not preclude the use of the original multimeric structures when necessary.

      (3) I think the manuscript is lacking in justification about the numbers used as cutoffs (1A^2 for change in SASA and 5A for maximum distance for contact) The authors just cite other papers applying these two types of cutoffs, but the underlying physico-chemical reasons are not explicit even in these papers. I think that, if the authors want AACDB to be used globally for benchmarks, they should provide direct sources of explanations of the cutoffs used, or provide multiple cutoffs. Indeed, different cutoffs are often used (e.g. ATOM3D uses 6A instead of 5A to determine contact between a protein and a small molecule https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/c45147dee729311ef5b5c3003946c48f-Abstract-round1.html). I think the authors should provide a figure with statistics pertaining to the interface atoms. I think showing any distribution differences between interface atoms determined according to either strategy (number of atoms, correlation between change in SASA and distance...) would be fundamental to understanding the two strategies. I think other statistics would constitute an enhancement as well (e.g. proportion of heavy vs. light chain residues).

      Some obvious limitations of AACDB in its current form include:

      AACDB only contains entries with protein-based antigens of at most 50 amino acids in length. This excludes non-protein-based antigens, such as carbohydrate- and nucleotide-based, as well as short peptide antigens.

      AACDB does not include annotations of binding affinity, which are present in SAbDab and have been proven useful both for characterizing drivers of antibody-antigen interactions (cite https://www.sciencedirect.com/science/article/pii/S0969212624004362?via%3Dihub) and for benchmarking antigen-specific antibody-design algorithms (cite https://www.biorxiv.org/content/10.1101/2023.12.10.570461v1)).

      We thank the reviewer for raising this critical point about the cutoff values used in AACDB. In the current study, the selection of the threshold value is very objective; the threshold chosen in the manuscript is summarized based on existing literature, and we have provided more literature support in the manuscript. The criteria for defining interacting amino acids in established tools, typically do not set the ΔSASA exceed 1 Å2 and the distance exceed 6 Å. While our manuscript emphasizes widely accepted thresholds for consistency with prior benchmarks, AACDB explicitly provides raw ΔSASA and distance values for all annotated residues. Users can dynamically filter the data from downloaded files by excluding entries exceeding their preferred thresholds (e.g., selecting 5Å instead of 6Å). This ensures adaptability to diverse research needs. In the revised version, we reset the distance threshold to 6 Å and calculated the interacting amino acids in order to give the user a wider range of choices. In the section “3.2 Database browse and search” of revised manuscript, we provide a description of the flexible choice of thresholds for practical use.

      Furthermore, distance and ΔSASA are two distinct metrics for evaluating interactions. Distance directly quantifies spatial proximity between atoms, reflecting physical contacts such as van der Waals interactions or hydrogen bonds, and is ideal for identifying direct spatial adjacency. ΔSASA, on the other hand, measures changes in solvent accessibility of residues during binding, capturing the contribution of buried surfaces to binding free energy. Even for residues not in direct contact, reduced SASA due to conformational changes may indicate indirect functional roles.

      As demonstrated through comparisons on the detailed information pages, the sets of interacting amino acids defined by these two methods differ by only a few residues, with no significant variation in their overall distributions. However, since interaction patterns vary significantly across different complexes, analyzing residue distributions across all structures using both criteria is not feasible.

      We thank the reviewer for highlighting these limitations. AACDB currently focuses on protein-based antigens ≤50 amino acids to prioritize structural consistency, which excludes non-protein antigens and shorter peptides. While affinity annotations are critical for benchmarking antibody design tools, these data were not integrated in this release due to insufficient data verification caused by internal team constraints. We acknowledge these gaps and plan to expand antigen diversity and incorporate affinity metrics in future updates.

      Reviewer #2:

      Summary:

      Antibodies, thanks to their high binding affinity and specificity to cognate protein targets, are increasingly used as research and therapeutic tools. In this work, Zhou et al. have created, curated, and made publicly available a new database of antibody-antigen complexes to support research in the field of antibody modelling, development, and engineering.

      Strengths:

      The authors have performed a manual curation of antibody-antigen complexes from the Protein Data Bank, rectifying annotation errors; they have added two methods to estimate paratope-epitope interfaces; they have produced a web interface that is capable of both effective visualisation and of summarising the key useful information in one page. The database is also cross-linked to other databases that contain information relevant to antibody developability and therapeutic applications.

      Weaknesses:

      The database does not import all the experimental information from PDB and contains only complexes with large protein targets.

      Thank you for the valuable feedback. As previously responded to Reviewer 1, due to limitations within our team, comprehensive data integration from PDB has not been achieved in the current version. We acknowledge the significance of expanding the database to encompass a broader range of experimental information and complexes with diverse target sizes. Regrettably, immediate updates to address these limitations are not feasible at this time. Nevertheless, we are committed to enhancing the database in upcoming upgrades to provide users with a more comprehensive and inclusive resource

      Recommendations for the authors:

      Reviewer #1:

      (1) Line 194: "produce" → "produced"

      We thank the reviewer for the feedback. We have checked the grammar and spelling carefully in the revised manuscript.

      (2) As mentioned in the public review, I think adding binding affinity annotations would greatly enhance the use cases for the database.

      We thank the reviewer for the suggestion. As the response in “Public review”. Due to team constraints, these data are not integrated into this release but are being collated. We recognize these gaps and plan to expand antigenic diversity and incorporate affinity metrics in future updates.

      (3) I think adding a visualization of interface atoms and contacts on an entry's webpage would be useful for someone exploring specific entries. It also would be useful if the authors provided a pymol command to select interface residues since that's a procedure any structural biologist is likely to do.

      We sincerely appreciate the reviewer’s constructive suggestions. In response to the request for enhanced visualization and accessibility of interface residue information, we have implemented the following improvements: (1) Web Interface Visualization. On the entry-specific webpage, we have added an interactive visualization window that highlights the antigen-antibody interaction interface using distinct colors. The interaction interface visualization has been incorporated into Figure 5 of the revised manuscript, with a detailed description. (2) PyMOL Command Accessibility. The “Help” page now provides step-by-step PyMOL commands to select and visualize interface residues.

      (4) I think the authors should provide headers to the files containing interface residues according to the change-in-SASA criterion, as they do for those computed according to contact. This would avoid unnecessary confusion - however slight - and make parsing easier. I was initially confused by the meaning of the last column, though after a minute I understood it to be the change in SASA.

      We thank the reviewer for providing such detailed feedback. We thank the reviewer for the comment and the suggestion. We have provided headers for the files of the interacting residues defined by ΔSASA.

      (5) Line 233: "AACDB's data processing pipeline supports mmCIF files" → The meaning and implications of this statement are not obvious to me, and are mentioned nowhere else in the paper. Do you mean that in AACDB there are structure entries that the RCSB PDB database only has in mmCIF file format, and not .pdb format? So, effectively, there are some entries in AACDB that are not in any other antibody-specific database?I checked and, as of Dec 3rd, 2024, there are 41 structures in AACDB that are NOT in SAbDab. Manually checking 5 of those 41 structures, none are mmCIF-only structures.

      We thank the reviewer for the valuable comment. Because of the size of the structures within certain entries, representing them in a single PDB format data file is not feasible due to the excessive number of atoms and polymer chains they contain. As a result, PDB stores these structures in “mmcif” format files. In AACDB, 47 entries, such as 7SOF, 7NKT, 7B27, and 6T9D, are only available in the “mmCIF” format from the PDB. The “.pdb” and “.cif” files contain atomic coordinates in distinct text formats, and the segmentation of these structure files is automatically conducted based on manually annotated antibody-antigen chains. To accommodate this, we have incorporated these considerations into our file processing pipeline, thereby enabling a fully automated file segmentation process. Additionally, we employed Naccess to calculate interatomic distances. However, since this software only accepts .pdb format files as input, we also converted all split .cif files into .pdb format within our fully automated pipeline. We apologize for the lack of clarity in the original manuscript and have included a more detailed explanation in the "2.2 PDB Splitting" section of the revised manuscript.

      Reviewer #2:

      (1) In SabDab and PDB, experimental binding affinities are also reported: could the authors comment on whether they also imported this information and double-checked it against the original paper? If it wasn't imported, that might discourage some users and should be considered as an extension for the future.

      We thank the reviewer for the comment and the suggestion. As the response in “Public review”. Due to current resource constraints, quantitative affinity data has not been incorporated into this release but is undergoing systematic curation. We explicitly recognize these limitations and propose a two-pronged strategy for future iterations: (1) broadening antigen diversity coverage through expanded structural sampling, and (2) integrating quantitative binding affinity measurements. In the Discussion section, we have included description outlining the planned enhancements.

      (2) Line 49-50: the references mentioned in connection to deep learning methods for antibody-antigen predictions seem a bit limited given the amount of articles in this field, with 3 of 4 references on one method only (SEPPA), could the authors expand this list to reflect a bit more the state of the art?

      We thank the reviewer for the suggestion. We agree that more relevant studies should be listed and therefore more references are provided in the revised manuscript.

      When mentioning the limitations of the existing databases, it feels a bit that the criticism is not fully justified. For instance:

      Line 52-53: could the authors elaborate on the reasons why such an identification is challenging? (Isn't it possible to make an efficient database-filtered search? Or rather, should one highlight that a more focussed resource is convenient and why?)

      Thank you for feedback. In this study, the keywords "antibody complex," "antigen complex," and "immunoglobulin complex," were employed during data collection. PDB returned over 30,000 results, of which only one-tenth met our criteria after rigorous filtering. This demonstrates that keyword searches, while useful, inherently limit result precision and introduce substantial redundancy, likely due to the PDB's search mechanism. That’s why we illustrated the significant challenges in identifying antibody-antigen complexes from general protein structures in the PDB.

      Line 55: reading the website http://www.abybank.org/abdb/, it would be fairer to say that the web interface lacks updates, as the database and the code have gone through some updates. Could the authors provide a concrete example of the reason why: 'The AbDb database currently lacks proper organization and management of this valuable data.'?

      We thank the reviewer for highlighting this issue. In our original manuscript, the statement that the AbDb database "lacks proper organization and management" was based on the absence of explicit statement regarding data updates on its official website at the time of submission, even though internal updates to its content may have occurred. We fully respect the long-standing contributions of AbDb to antibody structural research, and our comments were solely directed at the specific state of the database at that time. As the reviewer noted, following the release of our preprint, we have also taken note of AbDb's recent updates. To reflect the latest developments and avoid potential misinterpretation, we have revised the original statement in revised manuscript.

      Also 'this rapid updating process may inadvertently overlook a significant amount of information that requires thorough verification,': it's difficult for me to understand what this means in practice. Could the authors clarify if they simply mean that SabDab collects information from PDB and therefore tends to propagate annotation errors from there? If yes, I think it's enough to state it in these terms, and for sure I agree that the reason is that correcting these annotation errors requires a substantial amount of work.

      We thank the reviewer for providing such detailed feedback on the manuscript. We acknowledge that SabDab represents a highly valuable contribution to the field, and its rapid update mechanism has significantly advanced related research areas. However, as stated by the reviewer, we aim to clarify that SabDab primarily relies on automated metadata extraction from the PDB for annotation, and its rapid update process inherently inherits raw data from upstream sources. According to their paper, manual curation is only applied when the automated pipeline fails to resolve structural ambiguities. This workflow—dependent on PDB annotations with limited manual verification—may propagate errors provided by PDB. Examples include species misannotation and mutation status misinterpretation. We fully agree with the reviewer's observation that correcting errors in such cases necessitates labor-intensive manual curation, which is a core motivation for our study.

      Line 86: why 'Structures that consisted solely of one type of antibody were excluded'? Why exclude complexes with antigens shorter than 50 amino acids? These complexes are genuine antibody-antigen complexes.

      We thank the reviewer for the valuable question. The AACBD database is dedicated to curating structural data of antigen-antibody complexes. Structures featuring only a single antibody type are classified as free antibodies and systematically excluded from the database due to the absence of protein-bound partners. During data screening , we retained sequences shorter than 50 amino acids by categorizing them as peptides rather than eliminating them outright. The current release exclusively encompasses complexes with protein-based antigens. Meanwhile, complexes involving peptide, haptens, and nucleic acid antigens are undergoing systematic curation, with planned inclusion in future updates to broaden antigen category representation.

      Line 96 needs a capital letter at the beginning.

      Line 107: 'this would generate' → 'this generates' (given it is something that has been implemented, correct?).

      Line 124: missing an 'of'.

      Line 163: inspiring by -> inspired by.

      Thank you for feedback. All of the above grammatical or spelling errors have been revised in the manuscript.

      Line 109-111: apart from the example, it would be good to spell out the general rule applied to anti-idiotypic antibodies.

      We thank the reviewer for the valuable feedback. For anti-idiotypic antibodies complex. the partner antibody is treated as a dual-chain antigen, , necessitating individual evaluation of heavy chain and light chain interactions with the anti-idiotypic component. We have given a general rule for anti-idiotypic antibodies in section “2.2 PDB splitting” of revised manuscript.

      Line 155-159: could the authors provide references for the two choices (based on sasa and any-atom distance) that they adopted to define interacting residues?

      We thank the reviewer for the comment and the suggestion. As the same as the response to reviewer #1 in Public review. The interacting residues definition and the threshold chosen in the manuscript is summarized based on existing literature. We have added additional references for support in section “1.Introduction”. Our resource does not provide a fixed amino acid list. Instead, all interacting residues are explicitly documented alongside their corresponding ΔSASA (solvent-accessible surface area changes) and intermolecular distances, allowing researchers to flexibly select residue pairs based on customized thresholds from downloadable datasets. Furthermore, aligning with widely adopted criteria in current literature—where interactions are defined by ΔSASA >1 Ų and atomic distances <6 Å, we have recalibrated our analysis in the revised version. Specifically, we replaced the previous 5 Å distance threshold with a 6 Å cutoff to recalculate interacting residues.

      Line 176-178: could the authors re-phrase this sentence to clarify what they mean by 'change in the distribution'?

      We thank the reviewer for the suggestion. Our search was conducted with an end date of November 2023. However, Figure 3B includes an entry dated 2024. Upon reviewing this record, we identified that the discrepancy arises from the supersession of the 7SIX database entry (originally released in December 2022) by the 8TM1 version in January 2024. This version update explains the apparent chronological inconsistency. We regret any lack of clarity in our original description and have revised the corresponding section in the manuscript to explicitly clarify this change of database.

      Caption Figure 3: please spell out all the acronyms in the figure. Provide the date when the last search was performed (i.e., the date of the last update of these statistics).

      We thank the reviewer for the comment. We have systematically expanded all acronyms and included update dates for statistics in the legend of Figure 3. Corresponding changes have also been made to the statistical pages on the website.

      Finally, it would be advisable to do a general check on the use of the English language (e.g. I noted a few missing articles). In Figure 5 DrugBank contains typos.

      We sincerely appreciate the reviewer's meticulous attention to linguistic precision. We have corrected the typographical error in Figure 5 and conducted a comprehensive review of the entire manuscript to ensure accuracy and clarity.

    1. Author response:

      We are highly appreciative of your constructive criticism and that you found that our findings of interest and significance. Based on your helpful suggestions, we plan to revise the paper as following:

      (1) Although ETFDH is reduced, but not mutated across neoplasia, we appreciate your point pertinent to catalytically activity of ETFDH. To this end, in the revision we are planning to compare the effects of rescues using wild type ETFDH or one of the MADD-associated mutants with compromised catalytic activity.

      (2) We intend to measure steady-state nucleotide levels as a function of ETFDH status in the cell. If time and/or funding allow, we will also perform appropriate labelling experiments.

      (3) We will revise the text of the manuscript to address the minor points raised by the reviewers.

      Again, we would like to thank you for helpful comments, which we aim to address as outlined above and hopefully further improve our report.

    1. Author response:

      We sincerely thank all reviewers for their thoughtful, detailed, and supportive evaluations of our manuscript. We are very pleased that the reviewers appreciated the integrative approach of our study, the quality of the imaging and analyses, and the insights provided into the parallel evolution of biomineralization mechanisms in sponges and corals.

      We are carefully considering all the suggestions made, including those regarding the improvement of figure clarity and the clarification of certain image interpretations. These comments are extremely valuable, and we are preparing a detailed point-by-point reply to accompany our revised manuscript.

      It was also brought to our attention that the links to the Zenodo repository were incorrect. We apologize for this oversight and any inconvenience it may have caused and will updae the links in our revised manuscript. In the meantime, the correct Zenodo repositories can be accessed using the following links:

      https://zenodo.org/records/14755899

      https://zenodo.org/records/13847772

      We again thank the reviewers for their constructive feedback, which will help us to further strengthen the manuscript.

    1. Author response:

      We thank the editors and reviewers for their thoughtful and constructive evaluation of our manuscript, “Krüppel Regulates Cell Cycle Exit and Limits Adult Neurogenesis of Mushroom Body Neural Progenitors in Drosophila.” We are pleased that all reviewers recognised the novelty and significance of identifying Krüppel (Kr) as a key transcription factor promoting timely termination of mushroom body neuroblast (MBNB) proliferation, and the potential antagonistic function of Kr-h1.

      We appreciate the helpful suggestions aimed at improving the mechanistic clarity and presentation of our findings. Below, we outline how we plan to address the major points raised in the full revision.

      (1) Characterisation of the KrIf-1 allele and Kr expression

      We agree that clarifying the nature of the KrIf-1 allele is important. In response to this concern, we will examine Kr expression in KrIf-1 mutant larval, pupal, and adult brains using immunostaining and available reporter lines. These experiments will help determine whether the observed neuroblast retention phenotype correlates with altered Kr expression in MBNBs.

      (2) Regulatory relationships between Kr, Kr-h1, Imp, Syp, Chinmo, and E93

      We are currently performing additional experiments to clarify the interactions among these temporal factors. For instance, we are testing whether Kr-h1 overexpression alters the expression of Imp, Syp, and E93. We have obtained a published E93 antibody from Dr Chris Doe (Syed et al., 2017) and will include E93 expression analysis in our revised manuscript.

      While Chinmo is of interest, its expression is well established to be regulated downstream of Imp/Syp via mRNA stability (Liu et al., 2015; Ren et al., 2017). Given that we currently lack reliable tools to assess Chinmo levels, we will focus primarily on Imp, Syp, and E93 as readouts for Kr/Kr-h1 function. If we succeed in obtaining Chinmo antibodies or reporter lines in time, we will include corresponding data.

      (3) Expression of Kr-h1 in MBNBs

      We fully agree that direct evidence for Kr-h1 expression in MBNBs is important. To address this, we have obtained the Kr-h1::GFP BAC transgenic line (BDSC #96786) and are currently using it to assess Kr-h1 expression in MBNBs. We also tested an anti–Kr-h1 antibody previously reported by Kang et al. (2017), developed in the context of fat body studies, but it did not yield clear signals in larval MBNBs. However, previous work by Shi et al. (2007) clearly demonstrated Kr-h1 expression in the developing MB, including MBNBs, using a custom antibody developed by their lab. We also contacted the Lee lab to request this antibody, but unfortunately, it is no longer available. We will include the results obtained using the GFP BAC line in the revised manuscript and, if needed, pursue RNA in situ hybridisation to further validate Kr-h1 expression in MBNBs.

      (4) Temporal Kr knockdown and MARCM analysis

      We appreciate the suggestion to validate our RNAi-based temporal knockdown results using MARCM. We plan to perform MBNB-specific MARCM analysis following the strategy described by Rossi et al. (2020). However, this approach requires additional time due to the logistics of acquiring the necessary fly stocks, generating appropriate genetic combinations, and conducting clonal analyses. While we will make every effort to include these data, we note that RNAi-based knockdown offers the advantage of temporal reversibility and has been essential for assessing stage-specific requirements in our current study.

      (5) Details of the targeted genetic screen

      Kr was initially identified as part of a broader, ongoing effort to screen for candidate transcription factors and cell cycle regulators involved in neuroblast cell cycle exit and/or quiescence. As this screen is still preliminary and incomplete, we prefer not to include the full dataset at this stage. Instead, we will revise the manuscript to clarify that Kr was prioritised for further investigation based on the striking MBNB-specific phenotype observed upon RNAi-mediated knockdown and in the KrIf-1 mutant, rather than through a completed screening process.

      (6) Clarifying the model (Figure 6D) and interactions

      We will revise the proposed model to distinguish between experimentally supported interactions and speculative ones. As noted above, we will primarily focus on the Imp/Syp and E93 axis in relation to Kr and Kr-h1 activity. Chinmo will be omitted from the model unless further data become available to support its inclusion.

      (7) Clarifications on figures and data presentation

      We appreciate the feedback on figure clarity. We will revise figures such as 1B, 2C, and 3A to improve legibility and presentation. We will also correct typographical errors and figure references, and clarify the activity patterns of the GAL4 drivers. Specifically, while UASmCD8::GFP expression driven by OK107-GAL4 is markedly weaker in MBNBs than in their neuronal progeny (as seen, for example, in Figure S3C), the driver remains active and functionally relevant in MBNBs. We believe the weak expression in MBNBs likely explains the absence of a NB retention phenotype in OK107>KrIR adult brains (see main text, Lines 374–376). As suggested by the reviewer, we will clarify this point earlier in the manuscript and can include additional data showing OK107>GFP expression patterns in pupal MB lineages as supplementary material.

      (8) Analysis of public datasets

      We will include results from our analysis of publicly available datasets such as FlyAtlas2, modENCODE, and a time-course RNA-seq dataset specific to MBNBs (Liu et al., 2015). While the spatial resolution of FlyAtlas2 and modENCODE is limited, the MBNB dataset provides valuable temporal information up to 36 h after puparium formation (APF). From this dataset, we observe that Kr expression remains consistently low throughout development, with only a modest increase at 84 h ALH (mean TPM ~11) and 36 h APF (~7), suggesting it does not undergo strong transcriptional regulation in MBNBs. In contrast, Kr-h1 is highly expressed during early larval stages (24–84 h ALH; mean TPM ~55–60) and shows a marked suppression by 36 h APF (mean TPM ~2), consistent with its proposed role in promoting MBNB proliferation. Importantly, Eip93F (E93) exhibits a reciprocal pattern to Kr-h1—with minimal expression until 84 h ALH (mean TPM ~24), followed by a substantial induction at 36 h APF (mean TPM ~104), aligning with its known role in triggering neuroblast termination. These temporal expression dynamics support our model that Kr-h1 and E93 function in opposition during the transition from proliferative to terminating neuroblast states. We will summarise these findings in the revised manuscript, along with appropriate discussion of dataset limitations.

      We hope this provisional response conveys our strong commitment to thoroughly addressing the reviewers’ concerns and improving the manuscript. We are currently carrying out additional experiments and will submit a revised version with new data and enhanced clarity in due course.

      References:

      Kang et al., 2017. Sci Rep. 7(1):16369. doi: 10.1038/s41598-017-16638-1.

      Shi et al., 2007. Dev Neurobiol. 67(11):1614–1626. doi: 10.1002/dneu.20537.

      Rossi et al., 2020. eLife. 9:e58880. doi: 10.7554/eLife.58880.

      Liu et al., 2015. Science. 350(6258):317–320. doi: 10.1126/science.aad1886.

      Ren et al., 2017. Curr Biol. 27(9):1303–1313. doi: 10.1016/j.cub.2017.03.018. Syed et al., 2017. eLife. 6:e26287. doi: 10.7554/eLife.26287.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Garcia et al. describes how the expression of a respiratory chain alternative oxidase (AOX) from the tunicate Ciona intestinalis, capable of transferring electrons directly from reduced coenzyme Q (CoQ) to oxygen, is able to induce an increase in the mass of Drosophila melanogaster larvae and an accelerated development, especially when the larvae are kept at low temperatures. In order to explain this phenomenon, the paper addresses the modifications in the activity and levels of the 'canonical' electron transfer system (ETS), i.e., complexes I-IV and of the ATP synthase. In addition, the abundance of different metabolites as well as the NAD+/NADH ratios are measured, finding significant differences between the larvae.

      Strengths:

      The observations of differences in growth, body mass and food intake in the wt D. melanogaster larvae vs. those expressing the AOX transgene are solid. The evidence that mild uncoupling of the ETS might accelerate development of the fly larvae is convincing."

      We appreciate the reviewer’s attention to our results and hope we can improve the manuscript to address all criticism appropriately.

      Weaknesses:

      Some of the observations, especially those concerning the origin of the metabolic remodelling in AOX-expressing larvae, are left unexplained, and the argumentation is somewhat speculative. What the authors mean by "reconfiguration" of the mitochondrial electron transfer system is not clear. If this implies that there is an actual change in ETS function and/or structural organisation in the presence of AOX, this conclusion is not supported by the experimental data. In addition, the influence of AOX activity in the mitochondrial ETS system is tested in vitro in the presence of saturating concentrations of substrates. The real degree to which AOX activity is actually influencing ETS activity in vivo remains unknown.

      Indeed, the term “reconfiguration” may seem a little too strong. However, we do have preliminary structural data on larval mitochondria indicating that the term is adequate in this context. We plan to work on obtaining concrete data to sustain our claims that AOX imparts significant functional and structural remodeling of the organelle, which would be consistent with our respirometry and BN-PAGE data. If the data turns out not to be robust enough, we will consider replacing the term with one that better reflects our findings.

      We also realize that the in vivo data we are presenting (body mass, mobility, food intake) are indirect measurements of metabolism and that a more direct approach is necessary to assess the real degree to which AOX influences ETS activity in vivo. To address this issue, we plan to expand our pharmacological treatments of the larval development and to measure whole larval oxygen consumption.

      Reviewer #2 (Public review):

      Summary:

      This manuscript presents intriguing findings about the role of alternative oxidase (AOX) from the tunicate Ciona intestinalis in accelerating growth and development when expressed in Drosophila melanogaster.

      Strengths:

      The study is overall well-constructed, including appropriate analysis. Likewise, the manuscript is written clearly and supported by high-quality figures. The present study provides valuable insights into AOX's role in Drosophila development. The paper attempts to explore a unique mechanism by which AOX influences Drosophila development, providing insights into mitochondrial respiration and its physiological effects. This is relevant for understanding mitochondrial dysfunction and potential therapeutic applications. The study employs a variety of approaches, including calorimetry, infrared thermography, and genetic analyses, to investigate AOX's impact on metabolism and development.

      We sincerely thank the reviewer for recognizing the strengths and acknowledging the novelty of our study.

      Weaknesses:

      There are a number of methodological limitations and substantial gaps in the interpretation of the data presented, which reduces the strength of its conclusions. For instance, there is a misunderstanding of the non-proton motive nature of the AOX - it does not uncouple respiration, merely decouple it as it neither contributes to nor dissipates the proton motive force, in contrast to chemical uncouplers or proton uncouplers such as UCPs. The authors need to reassess their data in light of the above.

      The reviewer is absolutely right about the non-proton motive nature of AOX. We will reassess our data considering that AOX decouples respiration and, if necessary and possible, we will add new experiments to address the methodological limitations raised by the reviewer.

    1. Author response:

      We appreciate the reviewers' positive feedback on our paper. We especially thank them for their evaluation of the genetic analysis, which required a significant amount of timef time. We acknowledge that several aspects of our interpretation and description of the results need correction, as noted by both reviewers. Additionally, we recognize the importance of providing a more comprehensive overview of previous findings, including those conducted in mice, in the manuscript. In the revised version, we will thoroughly address the reviewers' concerns.

      Both reviewers emphasized the need for further validation to ascertain whether the specific requirement of Hox genes in the Hoxba and Hoxbb clusters for pectoral fin bud formation is due to their expression patterns or the functional roles of Hox proteins. This consideration has been on our agenda for some time; however, our submitted paper does not sufficiently address this aspect. In the revised manuscript, we will conduct a comprehensive analysis of the expression patterns of Hox genes in zebrafish to draw informed conclusions on this matter.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors investigate how the viscoelasticity of the fingertip skin can affect the firing of mechanoreceptive afferents and they find a clear effect of recent physical skin state (memory), which is different between afferents. The manuscript is extremely well-written and well-presented. It uses a large dataset of low threshold mechanoreceptive afferents in the fingertip, where it is particularly noteworthy that the SA-2s have been thoroughly analyzed and play an important role here. They point out in the introduction the importance of the non-linear dynamics of the event when an external stimulus contacts the skin, to the point at which this information is picked up by receptors. Although clearly correlated, these are different processes, and it has been very well-explained throughout. I have some comments and ideas that the authors could think about that could further improve their already very interesting paper. Overall, the authors have more than achieved their aims, where their results very much support the conclusions and provoke many further questions. This impact of the previous dynamics of the skin affecting the current state can be explored further in so many ways and may help us to better understand skin aging and the effects of anatomical changes of the skin.

      At the beginning of the Results, it states that FA-2s were not considered as stimuli did not contain mechanical events with frequency components high enough to reliably excite them. Was this really the case, did the authors test any of the FA-2s from the larger dataset? If FA-2s were not at all activated, this is also relevant information for the brain to signal that it is not a relevant Pacinian stimulus (as they respond to everything). Further, afferent receptive fields that were more distant to the stimulus were included, which likely fired very little, like the FA-2s, so why not consider them even if their contribution was low?

      Thank you for bringing this up, we have now clarified in the text that while FA-2s did respond at a low rate during the experiment, their responses were not reliably driven by the force stimuli. In the Methods section we have included the following text:

      “Initially, 10 FA-2 neurons were also included in the analysis. But their responsiveness during the experiment was remarkably low, and unlike the other neuron types, their responses were rarely affected by force stimuli. Specifically, only one of the observed FA-2 neurons responded during the force protraction phases. Due to the lack of clear stimulus-driven responses, FA-2 neurons were subsequently excluded from further analysis.”

      One question that I wondered throughout was whether you have looked at further past history in stimulation, i.e. not just the preceding stimulus, but 2 or 3 stimuli back? It would be interesting to know if there is any ongoing change that can be related back further. I do not think you would see anything as such here, but it would be interesting to test and/or explore in future work (e.g. especially with sticky, forceful, or sharp indentation touch). However, even here, it could be that certain directions gave more effects.

      This is a very interesting question! A discernible effect from the previous stimulus could persist at the end of the current stimulation (see Figure 4C), potentially influencing the next one—a 2-stimuli-back effect. Unfortunately, our experimental design did not allow for rigorous testing of this effect. While all possible pairs of stimulus directions were included in immediately consecutive trials, this was not the case for pairs separated by additional trials. Hence, the combination of a likely weak effect and limited variation in history precluded a thorough analysis of a 2-stimuli-back effect. Future work should delve into the time course of the viscoelastic effect in greater detail.

      Did the authors analyze or take into account the difference between receptive field locations? For example, did afferents more on the sides have lower responses and a lesser effect of history?

      An investigation into the potential impact of the relationship between the receptive field location on the fingertip skin and the primary contact site of the stimulus surface revealed no discernible influence for SA-1 and SA-2 neurons. In contrast, FA-1 neurons, particularly those predominantly sensitive to the previous stimulation or displaying mixed sensitivity, exhibited a tendency to terminate near the primary stimulation site. We have added these observations to the text:

      “We found no straightforward relationship between a neuron's sensitivity to current and previous stimulation and its termination site in fingertip skin. Specifically, there was no statistically significant effect of the distance between a neuron's receptive field center and the primary contact site of the stimulus surface on whether neurons signaled current, prior, or mixed information for SA-1 (Kruskal-Wallis test H(2)=3.86, p= 0.15) or SA-2 neurons (H(2)=0.75, p=0.69). However, a significant difference emerged for FA-1 neurons (H(2)=8.66, p=0.01), indicating that neurons terminating closer to the stimulation site on the flat part of the fingertip were more likely to signal past or mixed information.”

      Was there anything different in the firing patterns between the spontaneous and non-spontaneously active SA-2s? For example, did the non-spontaneous show more dynamic responses?

      The firing patterns of both spontaneously and non-spontaneously active SA-2 neurons shared similarities in terms of adaptation and range of firing rate modulation in response to force stimuli, i.e., ‘dynamic response’. The distinction lay in the pattern of modulation of the firing rate associated with stimulus presentations. For spontaneously active SA-2 neurons, this modulation occurred around a significant background discharge, implying that a force stimulus could either decrease or increase the firing rate, depending on how it deformed the fingertip. This characteristic is well illustrated by the firing pattern of the neuron depicted in the lower panels of Figure 3D. Conversely, in non-spontaneously active SA-2 neurons, a force stimulus could only induce an increase in the firing rate or no change. Although the neuron depicted in the upper panels of Figure 3D exhibited some background activity, it serves to exemplify this characteristic. In the text, we have elucidated the dynamics of the SA-2 neuron response by highlighting that force stimulation can either decrease or increase the firing rate in neurons with spontaneous activity through the following addition/change:

      “This increased variability was most evident during the force protraction phase where most neurons exhibited the most intense responses. Increased variability was also observed in instances where the dynamic response to force stimulation involved a decrease in the firing rate (lower panels of Figure 3D). This phenomenon was observed in SA-2 neurons that maintained an ongoing discharge during intertrial periods (cf. Fig. 2A). In these cases, the response to a force stimulus constituted a modulation of the firing rate around the background discharge, signifying that a force stimulus could either decrease or increase the firing rate depending on the prevailing stimulus direction.”

      Were the spontaneously active SA-2 afferents firing all the time or did they have periods of rest - and did this relate to recent stimulation? Were the spontaneously active SA-2s located in a certain part of the finger (e.g. nail) or were they randomly spread throughout the fingertip? Any distribution differences could indicate a more complicated role in skin sensing.

      SA-2 neurons, in general, are well-known for undergoing significant post-stimulation depression (e.g., Knibestöl and Vallbo, 1970; Chambers et al., 1972; Burgess and Perl, 1973). In our force stimulations, this post-excitatory depression manifested as a reduced or absent response during the latter part of the stimulus retraction period for stimuli in directions that markedly excited the neuron. The excitability recovered when the fingertip relaxed during the subsequent intertrial period, and for "spontaneously active" neurons, the firing resumed (see examples in Figure 7A). Furthermore, some “spontaneously active” neurons could be silenced or exhibit a near-silent period during force stimulation for certain force directions, while the spontaneous firing returned during the upcoming intertrial period when the fingertip shape recovered (for example, see responses to stimulation in the proximal and especially ulnar directions in the top panel in Figure 7A).

      Regarding the location of the receptive field centres of spontaneously active and non-spontaneously active SA-2 neurons on the fingertip we did not observe any obvious spatial segregation. To illustrate this, we have revised Figure 1A by color-marking SA-2 neurons that exhibited ongoing activity in intertrial periods, and the figure caption has been modified accordingly:

      “Figure 1. Experimental setup. A. Receptive field center locations shown on a standardized fingertip for all first-order tactile neurons included in the study, categorized by neuron type. Purple symbols denote spontaneously active SA-2 neurons exhibiting ongoing activity without external stimulation.”

      Did the authors look to see if the spontaneous firing in SA-2s between trials could predict the extent to which the type 1 afferents encode the proceeding stimulus? Basically, does the SA-2 state relate to how the type 1 units fire?

      We found no clear indications that the responses of FA-1 and SA-1 could be readily anticipated based on the firing patterns of SA-2 neurons.

      In the discussion, it is stated that "the viscoelastic memory of the preceding loading would have modulated the pattern of strain changes in the fingertip differently depending on where their receptor organs are situated in the fingertip". Can the authors expand on this or make any predictions about the size of the memory effect and the distance from the point of stimulation?

      We have explored this topic further in the text, referring to recent studies modeling essential aspects of fingertip mechanics. However, in our view, current models lack the capability to predict the specific nature sought by the reviewer. These models should include a detailed understanding of the intricate networks of collagen fibers anchoring the pulp tissue at the distal phalangeal bone and the nail. They should also consider potential inherent directional preferences of the receptor organs, attributed to their microanatomy. The text modifications are as follows:

      “In addition to the receptor organ locations, the variation in sensitivity among neurons to fingertip deformations in response to both previous and current loadings would stem from the fingertip’s geometry and its complex composite material properties. Possible inherent directional preferences of the receptor organs, attributed to their microanatomy, could also be significant. However, mechanical anisotropy, particularly within the viscoelastic subcutaneous tissue of the fingertip induced by intricately oriented collagen fiber strands forming fat columns in the pulp (Hauck et al., 2004), are likely to play a crucial role. This anisotropy would shape the dynamic pattern of strain changes at neurons' receptor sites, intricately influencing a neuron's sensitivity not only to current but also to preceding loadings. Indeed, recent modeling efforts suggest that such mechanical anisotropy strongly influences the spatiotemporal distribution of stresses and strains across the fingertip (Duprez et al., 2024).”

      Relatedly, we have included additional text to provide a more comprehensive explanation of the “bulk deformation” of the fingertip that occurs during the loadings:

      “As pressure increases in the pulp, the pulp tissue bulges at the end and sides of the fingertip. Simultaneously, the tangential force component amplifies the bulging in the direction of the force while stretching the skin on the opposite side.”

      In the discussion, it would be good if the authors could briefly comment more on the diversity of the mechanoreceptive afferent firing and why this may be useful to the system.

      The diversity in responses among neurons is instrumental in enhancing the information transmitted to the brain by averting redundancy in information acquisition. This diversity thereby contributes to an overall increase in information. We've included a brief statement, along with several references, underscoring this concept:

      "The resulting diversity in the sensitivities of neurons might enhance the overall information collected and relayed to the brain by the neuronal population, facilitating the discrimination between tactile stimuli or mechanical states of the fingertip (see Rongala et al., 2024; Corniani et al., 2022; Tummala et al., 2023, for more extensive explorations of this idea)."

      Also, the authors could briefly discuss why this memory (or recency) effect occurs - is it useful, does it serve a purpose, or it is just a by-product of our skin structure? There are examples of memory in the other senses where comparisons could be drawn. Is it like stimulus adaptation effects in the other senses (e.g. aftereffects of visual motion)?

      We have expanded the concluding paragraph of the discussion, specifically delving into the question of whether the mechanical memory effect serves a deliberate purpose or is simply an incidental byproduct of our skin structure:

      “In any case, the viscoelastic deformability of the fingertips plays a pivotal role in supporting the diverse functions of the fingers. For example, it allows for cushioned contact with objects featuring hard surfaces and allows the skin to conform to object shapes, enabling the extraction of tactile information about objects' 3D shapes and fine surface properties. Moreover, deformability is essential for the effective grasping and manipulation of objects. This is achieved, among other benefits, by expanding the contact surface, thereby reducing local pressure on the skin under stronger forces and enabling tactile signaling of friction conditions within the contact surface for control of grasp stability. Throughout, continuous acquisition of information about various aspects of the current state of the fingertip and its skin by tactile neurons is essential for the functional interaction between the brain and the fingers. In light of this, the viscoelastic memory effect on tactile signaling of fingertip forces can be perceived as a by-product of an overall optimization process within prevailing biological constraints.”

      One point that would be nice to add to the discussion is the implications of the work for skin sensing. What would you predict for the time constant of relaxation of fingertip skin, how long could these skin memory effects last? Two main points to address here may be how the hydration of the skin and anatomical skin changes related to aging affect the results. If the skin is less viscoelastic, what would be the implications for the firing of mechanoreceptors?

      It is likely that the time constant depends to some extent on mechanical factors of the skin, which will likely change due to age or environmental factors. However, while these questions are intriguing, they fall outside the scope of the current study and we are not aware of studies that have addressed these issues directly in experiments either.

      How long does it take for the effect to end? Again, this will likely depend on the skin's viscoelasticity. However, could the authors use it in a psychophysical paradigm to predict whether participants would be more or less sensitive to future stimuli? In this way, it would be possible to test whether the direction modifies touch perception.

      Time constants for tissue viscoelasticity have been estimated to extend up to several seconds (see citations in the introduction). While direct perceptual effects could indeed be explored through psychophysical experimental paradigms, we are currently unaware of any studies specifically addressing the type of effect described in this study. In addition to the statement that, concerning manipulation and haptic tasks, "to our knowledge, a possible influence of fingertip viscoelasticity on task performance has not been systematically investigated," we have now also addressed tactile psychophysical tasks conducted during passive touch with the following sentence in the text:

      “Similarly, there is a lack of systematic investigation of potential effects of fingertip viscoelasticity on performance in tactile psychophysical tasks conducted during passive touch.”

      Reviewer #2 (Public Review):

      Summary:

      The authors sought to identify the impact skin viscoelasticity has on neural signalling of contact forces that are representative of those experienced during normal tactile behaviour. The evidence presented in the analyses indicates there is a clear effect of viscoelasticity on the imposed skin movements from a force-controlled stimulus. Both skin mechanics and evoked afferent firing were affected based on prior stimulation, which has not previously been thoroughly explored. This study outlines that viscoelastic effects have an important impact on encoding in the tactile system, which should be considered in the design and interpretation of future studies. Viscoelasticity was shown to affect the mechanical skin deflections and stresses/strains imposed by previous and current interaction force, and also the resultant neuronal signalling. The result of this was an impaired coding of contact forces based on previous stimulation. The authors may be able to strengthen their findings, by using the existing data to further explore the link between skin mechanics and neural signalling, giving a clearer picture than demonstrating shared variability. This is not a critical addition, but I believe would strengthen the work and make it more generally applicable.

      Strengths:

      - Elegant design of the study. Direct measurements have been made from the tactile sensory neurons to give detailed information on touch encoding. Experiments have been well designed and the forces/displacements have been thoroughly controlled and measured to give accurate measurements of global skin mechanics during a set of controlled mechanical stimuli.

      - Analytical techniques used. Analysis of fundamental information coding and information representation in the sensory afferents reveals dynamic coding properties to develop putative models of the neural representation of force. This advanced analysis method has been applied to a large dataset to study neural encoding of force, the temporal dynamics of this, and the variability in this.

      Weaknesses:

      - Lack of exploration of the variation in neural responses. Although there is a viscoelastic effect that produces variability in the stimulus effects based on prior stimulation, it is a shame that the variability in neural firing and force-induced skin displacements have been presented, and are similarly variable, but there has been no investigation of a link between the two. I believe with these data the authors can go beyond demonstrating shared variability. The force per se is clearly not faithfully represented in the neural signal, being masked by stimulation history, and it is of interest if the underlying resultant contact mechanics are.

      Thank you for this suggestion. We have added a new section investigating the link between skin deformation and neural firing in more depth via a simple neural model. Please see our answer below in the ‘Recommendations’ section for further details.

      Validity of conclusions:

      The authors have succeeded in demonstrating skin viscoelasticity has an impact on skin contact mechanics with a given force and that this impacts the resultant neural coding of force. Their study has been well-designed and the results support their conclusions. The importance and scope of the work is adequately outlined for readers to interpret the results and significance.

      Impact:

      This study will have important implications for future studies performing tactile stimulation and evaluating tactile feedback during motor control tasks. In detailed studies of tactile function, it illustrates the necessity to measure skin contact dynamics to properly understand the effects of a force stimulus on the skin and mechanoreceptors.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (Very) minor comments

      - The authors say at the beginning of the Results that, "The fourth type of tactile neurons in the human glabrous skin, fast adapting type II neurons...". Although generally written that there are four types of afferent in the glabrous skin, it would be better to state that these are low-threshold A-beta myelinated mechanoreceptive afferents, at least one time, as there are other types of afferent in the glabrous skin that respond to mechanical stimulation (e.g. low and high threshold C-fibers).

      This is now clarified at the start of the Results section:

      “We recorded action potentials in the median nerve of individual low-threshold A-beta myelinated first-order human tactile neurons innervating the glabrous skin of the fingertip…”

      - Fig. 3: Could you add '(N)' as the measurement of force for Fig. 3A for Fz, Fy, and Fz? Also, please change 'Data was recorded' to 'Data were recorded' in the legend.

      Fixed.

      - At the beginning of the Methods, you say that your study conforms to the Declaration of Helsinki, which actually requires pre-registration in a database. If you did not pre-register your study, please can you add '... in accordance with the Declaration of Helsinki, apart from pre-registration in a database'.

      Thanks for making us aware of this. We have added the suggested qualifier to the ethics statement.

      Reviewer #2 (Recommendations For The Authors):

      The neural representation/encoding of the actual displacement vectors would be a useful addition to the analyses. These vectors have been demonstrated to systematically change with the condition in the irregular series (Figure 2E) and will thus significantly act on the dynamics of induced mechanical changes in the skin with a given interaction force. Thus, it could be examined how the neurons code the magnitude of displacements as well as their direction. An evaluation of the extent to which the imposed displacement magnitudes are encoded in the neural responses would be a useful addition in explaining the signalling of the force events and how the central nervous system decodes these. Evaluating an alternative displacement encoding for comparison to pure force encoding may reveal more about how contact events are represented in the tactile system, which must decode these variable afferent signals to reconstruct a percept of the interaction. It could then be explored how the central nervous system may then scale the dynamic afferent responses based on the background viscoelastic state likely to be present in the SA-II afferent signals (Figure 7) for a context in which to evaluate the dynamic contact forces. This may of course be a complex relationship for the type-I afferents, where the underlying mechanical events evoking the firing (microslips not represented in global forces) have not been measured here. Such a model could be more widely applicable, as the skin viscoelasticity and displacement magnitudes are a straightforward measurement metric and could perhaps be used as a better proxy for neural signalling. This would allow the investigation of a wider variety of forces, and the study of the timing of the viscoelastic effect, both of which have been fixed here. This would give the work a broader impact, rather than just highlighting that this effect produces variability, it could reveal if this mechanical feature is structured in the neural representation. The categorical encoding/decoding tested here is specific to the stimuli used (magnitudes, intervals), but there is the possibility that this may be more generally applicable (within the bounds of forces/speeds) if the underlying basis of the variability in the signalling produced by the viscoelasticity is identified. Since the time course of the viscoelasticity has not been measured here (fixed forces and intervals), further study is required to fully understand the implications this has for a wider variety of situations.

      We agree that a better understanding of how the mechanical deformations are reflected in the resulting spike trains would be valuable. While ultimately a full understanding will need precise measurements of skin deformation across the whole fingertip to account for mechanical propagation to mechanoreceptor locations, relating the deformations at the contact location with neural firing patterns directly can provide useful hints into which aspects of deformation are encoded and how. To this end, we ran a new analysis that aimed to predict the time-varying neural responses directly from the recorded mechanical movements of the contactor.

      Below we have reproduced the new results and methods text along with the additional figures for this analysis. Note that we have also added text in the Discussion to interpret these findings in the context of our other results.

      New section in Results titled Predicting neural responses from contactor movements: “The similarity in the history-dependent variation in neural firing and fingertip deformation at a given force stimulus suggests that neuronal firing is determined by how the fingertip deforms rather than the applied force itself. However, this similarity does not clarify the relationship between fingertip deformation dynamics and neural signaling. To investigate further, we fit cross-validated multiple linear regression models to evaluate how well distinct aspects of contactor movement could predict the time-varying firing rates of individual neurons during the protraction phases of the irregular sequence. The models used predictors based on (1) the three-dimensional position of the contactor, (2) its three-dimensional velocity, (3) a combination of position and velocity signals, and, finally, (4) position and velocity signals along with all possible two-way interactions between them, capturing potentially complex relationship between fingertip deformations and neural signaling.

      Comparing the variance explained (R<sup>2</sup>) by each regression model for each neuron type revealed clear differences between the models (Figure 5A). A two-way mixed design ANOVA, with regression model as within-group effects and neuron type as a between-group effect revealed a main effect of model on variance explained (F(3,462) = 815.5, p < 0.001, η<sub>p</sub><sup>2</sup> = 0.84). Model prediction accuracy overall increased with the number of predictors, with the two-way interaction model outperforming all others (p < 0.001 for all comparisons, Tukey’s HSD). Additionally, a significant main effect of neuron type (F(2,154) = 29.8, p < 0.001, η<sub>p</sub><sup>2</sup> = 0.28) and a significant interaction between regression model and neuron type were observed (F(6,462) = 50.8, p < 0.001, η<sub>p</sub><sup>2</sup> = 0.40).

      For neuron type, model predictions were most accurate for SA-2 neurons, followed by SA-1 neurons, with FA-1 neurons showing the lowest accuracy (p < 0.003 for all comparisons, Tukey’s HSD). The interaction between model and neuron type revealed distinct patterns. For SA-1 and SA-2 neurons, position-only and velocity-only models had similar prediction accuracy (p ≥ 0.996, Tukey’s HSD) with no significant differences between these neuron types (p ≥ 0.552, Tukey’s HSD). FA-1 neurons performed poorly with the position-only model but showed higher accuracy with the velocity-only model (p < 0.001, Tukey’s HSD) and better than SA-1 neurons (p = 0.006, Tukey’s HSD). Models combining position and velocity predictors (without interactions) surpassed both position-only and velocity-only models for SA-1 and SA-2 neurons (p < 0.001, Tukey’s HSD). Overall, the differences between neuron types broadly match their tuning to static and dynamic stimulus properties.

      The two-way interaction model, accounting for most variance in neural responses, produced mean R<sup>2</sup> values of 0.75 for FA-1, 0.88 for SA-1, and 0.91 for SA-2 neurons (Figure 5A). To evaluate the contribution of the different predictors, we ranked them using the permutation feature importance method, focusing on the six most important ones. Regression analyses using only these variables explained almost all of the variance explained by the full model, with a median R<sup>2</sup> reduction of just 0.055 across all neurons. Across all neuron types, at least half included all three velocity components (dPx, dPy, dPz) among the top six, with FA-1 neurons showing the highest prevalence (Figure 5B). Interactions between normal position (Pz) and each velocity component were also frequently observed, while interactions involving tangential position and velocity components were less common. Interactions among velocity components were relatively well represented, followed by interactions limited to position components. Position signals were generally less represented, except for normal position (Pz) in slowly adapting neurons, where it appeared in 50% of SA-1 and 68% of SA-2 neurons. Despite these broad trends, important predictors varied widely across ranks even within a given neuron class (see Figure 5-figure supplement 1), and even the most frequent variables appeared in only a subset of cases, suggesting broad variability in sensitivity across neurons.”

      New methods paragraph titled Predicting time-varying firing rates from skin deformations:

      “This analysis was conducted in Python (v3.13) with pandas for data handling, numpy for numerical operations, and scikit-learn for model fitting and evaluation.

      To assess how well individual neurons' time-varying firing rates could be predicted from simultaneous contactor movements, we fitted multiple linear regression models (see Khamis et al., 2015, for a similar approach}. This analysis focused on the force protraction phase of the irregular sequence, where neurons were most responsive and sensitive to stimulation history. Data from 100 ms before to 100 ms after the protraction phase (between -0.100 s and 0.225 s relative to protraction onset) were included for each trial. Neurons were included if they fired at least two action potentials during the force protraction phase and the following 100 ms in at least five of the 25 trials. This ensured sufficient variability in firing rates for meaningful regression analysis, resulting in 68 SA-1, 38 SA-2, and 51 FA-1 neurons being included.

      Contractor position signals digitized at 400 Hz were linearly interpolated to 1000 Hz. Instantaneous firing rates, derived from action potentials sampled at 12.8 kHz, were resampled at 1000 Hz to align with position signals. A Gaussian filter (σ = 10 ms, cutoff ~16 Hz) was applied to the firing rate as well as to the position signals before differentiation. To account for axonal conduction (8–15 ms) and sensory transduction delays (1–5 ms), firing rates were advanced by 15 ms to align approximately with independent variables.

      Regressions were performed using scikit-learn's Ridge and RidgeCV regressors, which apply L2 regularization to mitigate overfitting. Hyperparameter tuning for the regularization parameter (alpha) was performed using GridSearchCV with a predefined range (0.001–1000.0), incorporating five-fold cross-validation to select the best value. To minimize overfitting risks, model performance was further validated with independent five-fold cross-validation (KFold), and R<sup>2</sup> scores were computed using cross_val_score.

      We constructed four linear regression models with increasing complexity: (1) Position-only, using three-dimensional contactor positions (Px, Py, Pz); (2) Velocity-only, using three-dimensional velocities (dPx, dPy, dPz); (3) Combined, including all position and velocity signals (6 predictors); and (4) Interaction, including all signals and their two-way interactions (21 predictors). All features were standardized using StandardScaler to improve regularization and model convergence. PolynomialFeatures generated second-order interaction terms for the interaction model. Feature importance was evaluated with permutation_importance, and simpler models were built using the most important features. These models were validated through cross-validation to assess retained explanatory power.”

      Minor:

      - It would be useful to add a brief description of the material aspects of the contactor tip to the methods (as per Birznieks 2001).

      We have added the following statement:

      “To ensure that friction between the contactor and the skin was sufficiently high to prevent slips, the surface was coated with silicon carbide grains (50–100 μm), approximating the finish of smooth sandpaper.”

      - The axes labelling on Figure 3A and legend description is ambiguous, probably placing the Px, Py, and Pz labels on the far left axes and the Fx, Fy, and Fz on the right side of the far right axes would make this clearer.

      Label placement has been improved along with some other minor fixes.

      - For the quasi-static phase analysis, the phrase "absence of loading" used in reference to the interstimulus period and SA-II afferents does not seem to be a correct description. The finger is still loaded (at least in the normal direction), with a magnitude of imposed displacement that counteracts the viscoelastic force exerted by the skin mechanics of the fingertip. Although there is a zero net-force load, a mechanical stimulus is still being actively applied to the skin.

      We have changed the wording throughout the text and now consistently refer either to the “interstimulus period” directly or to an “absence of externally applied stimulation” to avoid confusion.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #2 (Public review):

      Summary:

      The revised paper by Kim et al. reports two disease mutations in proBMP4, S91C and E93G, disrupt the FAM20C phosphorylation site at Ser91, blocking the activation of proBMP4 homodimers, while still allowing BMP4/7 heterodimers to function. Analysis of DMZ explants from Xenopus embryos expressing the proBMP4 S91C or E93G mutants showed reduced expression of pSmad1 and tbxt1. The expert amphibian tissue transplant studies were expanded to in vivo studies in Bmp4S91C/+ and Bmp4E93G/+ mice, highlighting the impact of these mutations on embryonic development, particularly in female mice, consistent with patient studies. Additionally, studies in mouse embryonic fibroblasts (MEFs) demonstrated that the mutations did not affect proBMP4 glycosylation or ER-to-Golgi transport but appeared to inhibit the furin-dependent cleavage of proBMP4 to BMP4. Based on these findings and AI modeling using AlphaFold of proBMP4, the authors speculate that pSer91 influences access of furin to its cleavage site at Arg289AlaLysArg292 in a new "Ideas and Speculation" section. Overall, the authors addressed the reviewers' comments, improving the presentation.

      Strengths:

      The strengths of this work continue to lie in the elegant Xenopus and mouse studies that elucidate the impact of the S91C and E93G disease mutations on BMP signaling and embryonic development. Including an "Ideas and Speculation" subsection for mechanistic ideas reduces some shortcomings regarding the analysis of the underlying mechanisms.

      Weaknesses:

      (1)  (Minor) In Figure S1 and lines 165-174 and 179-180, the authors should consider that, unlike the wild-type protein (Ser), which can be reversibly phosphorylated or dephosphorylated, phosphomimic mutations are locked into mimicking either the phosphorylated state (Asp) or the non-phosphorylated state (Ala). Consequently, if the S91D mutant exhibits lower activity than WT, it could imply that S91D interferes with other regulatory constraints, as the authors suggest. However, it may also be inhibiting activation. Therefore, caution is warranted when comparing S91D with S91C to conclude that Ser91 phosphorylation increases BMP4 activity. While additional experiments are not necessary, further consideration is essential.

      (Minor) In lines 394-399, the authors cleverly speculate that pS91 interacts with Arg289-the essential P4 arginine for furin processing. If so, this interaction could hinder the cleavage of proBMP4, as indicated by the results in Figure S1. The discussion would benefit from considering that, contrary to their favored model, dephosphorylation at Ser91 might actually facilitate cleavage.

      We have added a paragraph raising this possibility but explaining why it is unlikely and inconsistent with our in vivo data. The S91D construct was a simple control that was tested in ectopic expression assays and not in vivo.  We can make no conclusions about whether this construct resembles the phosphorylated state or whether it hinders or facilitates cleavage in vivo. The conclusion that dephosphorylation promotes BMP4 cleavage or activity is not compatible with the finding that two mutations associated with birth defects in humans (p.S91C or p.E93G) that are predicted to prevent FAM20C-mediated phosphorylation of the BMP4 prodomain lead to impaired proteolytic maturation of endogenous BMP4 and reduced BMP activity in vivo. 

      (2)  In Figure 4, panels A, E, and I, the proBMP bands in the mouse embryonic lysates and MEFs expressing the mutations show a clear size shift. Are these shifts a cause or a consequence of the lack of cleavage? Regardless, the size shifts should be explicitly noted.

      These intriguing shifts were observed in some but not all biological replicates.  When present, the shifts were not reversed by treatment with phosphatases or deglycosylases, and the shifts were never observed in epitope tagged wild type controls.  We have added a paragraph noting the shifts and our tests of whether they might be due to glycosylation, phosphorylation or epitope tags. 

      (3)  (Minor) In line 314, the authors should consider modifying the wording to: "is required for modulating proprotein convertase..."

      The original wording (“Collectively, our findings are consistent with a model in which FAM20C-mediated phosphorylation of the BMP4 prodomain is not required for folding or exit of the precursor protein from the ER, but is required for proprotein convertase recognition and/or for trafficking to post-TGN compartment(s) where BMP4 is cleaved”) more accurately reflects the model that is supported by our findings. Stating that “phosphorylation ……is required to modulate proprotein convertase recognition and/or trafficking” is vague and leaves open the possibility that it modulates in either direction, which our data do not support as described in point 1 above.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews: 

      Reviewer #1 (Public Review): 

      This study investigates the role of microtubules in regulating insulin secretion from pancreatic islet beta cells. This is of great importance considering that controlled secretion of insulin is essential to prevent diabetes. Previously, it has been shown that KIF5B plays an essential role in insulin secretion by transporting insulin granules to the plasma membrane. High glucose activates KIF5B to increase insulin secretion resulting in the cellular uptake of glucose. In order to prevent hypoglycemia, insulin secretion needs to be tightly controlled. Notably, it is known that KIF5B plays a role in microtubule sliding. This is important, as the authors described previously that beta cells establish a peripheral sub-membrane microtubule array, which is critical for the withdrawal of excessive insulin granules from the secretion sites. At high glucose, the sub-membrane microtubule array is destabilized to allow for robust insulin secretion. Here the authors aim to answer the question of how the peripheral array is formed. Based on the previously published data the authors hypothesize that KIF5B organizes the sub-membrane microtubule array via microtubule sliding. 

      General comment: 

      This manuscript provides data that indicate that KIF5B, like in many other cells, mediates microtubule sliding in beta cells. This study is limited to in vitro assays and one cell line. Furthermore, the authors provide no link to insulin secretion and glucose uptake and the overall effects described are moderate. Finally, the overall effect of microtubule sliding upon glucose stimulation is surprisingly low considering the tight regulation of insulin secretion. Moreover, the authors state "the amount of MT polymer on every glucose stimulation changes only slightly, often undetectable…. In fact, we observe a prominent effect of peripheral MT loss only after a long-term kinesin depletion (three-four days)". This challenges the view that a KIF5Bdependent mechanism regulating microtubule sliding plays a major role in controlling insulin secretion. 

      (1) Our initial study was indeed done in a cell line, which is a normal approach to addressing molecular mechanisms of a phenomenon in a challenging cell model: primary pancreatic beta cells are prone to rapidly dedifferentiate outside of the organism and are hard to genetically modify. To address this reviewer’s comment, in the revised manuscript we now confirm the phenotype in beta cells within intact pancreatic islets from a KIF5B KO mouse model (New Figure 2 – Supplemental Figure 1).

      (2) We agree that testing the effect of microtubule sliding on insulin secretion is an important question. Unfortunately, the experimental design needed to accomplish this task is not straighDorward. Importantly, besides microtubule sliding, KIF5B is heavily engaged in insulin granule transport, and GSIS deficiency upon KIF5B inactivation is well documented (e.g. Varadi et al 2002). In this study, we choose not to repeat this GSIS assay because of ample existing data. However, this reported GSIS deficiency could result from a combination of lack of insulin granule delivery to the periphery (previous data) and from the depletion of insulin granules from the periphery due to the loss of the submembrane MT bundle (this study and Bracey et al 2020).  In order to exclusively test the role of MT sliding in secretion, a significant investment in mutant tool development would be needed. Ideally, a new mutant mouse model where insulin granule transport is allowed by MT sliding in blocked must be developed to specifically address this question. To conclude, answering this question will be the subject for another, follow-up study. 

      (3) We respecDully disagree with the reviewer’s opinion that the effect of MT sliding in beta cells is moderate. As MT networks go, even a slight change in MT configuration often has dramatic consequences. For example, in mitotic spindles, a tiny overgrowth of microtubule ends during metaphase, which causes them to attach to both kinetochores rather than just one, is very significant for the efficiency of chromosome segregation, causing aneuploidy and cancer. The changes in beta-cell MT networks that we are reporting are much stronger: the effect on the peripheral MT network accumulated over three days of KIF5B depletion is dramatic (Fig 2 B, C). Short-term gross MT network configurations after a single glucose stimulation are harder to detect, but MTs at the cell periphery are, in fact, destabilized and fragmented, as we and others have previously reported (Ho et al 2020, Mueller et al 2021). Preventing this MT rearrangement completely blocks GSIS (Zhu et al 2015, Ho et al 2020). 

      One of the most fascinating features of insulin secretion regulation is that the amount of generated insulin granules significantly exceeds the normal physiological needs for insulin secretion (~100 times more than needed). At the same time, even slightly facilitated glucose depletion can be devastating. Accordingly, the excessive insulin content of a beta cell resulted in the development of multiple levels of control, preventing excessive secretion. Our previous data suggest that the peripheral MT array provides one of those mechanisms. This study indicates that microtubule sliding is necessary to form the proper peripheral network in the long term. Short-term glucose-induced changes in the peripheral MT array likely need to be subtle to prevent over-secretion. Thus, we are not surprised that a dramatic effect of sliding inhibition is only detectable by our approaches after the changes in the MT network accumulate over time. In the revised paper, we now discuss the potential impact of peripheral MT sliding on positive and negative regulation of secretion and add a schematic model illustrating these processes.

      Specific comments: 

      (1) Notably, the authors have previously reported that high glucose-induced remodeling of microtubule networks facilitates robust glucose-stimulated insulin secretion. This remodeling involves the disassembly of old microtubules and the nucleation of new microtubules. Using real-time imaging of photoconverted microtubules, they report that high levels of glucose induce rapid microtubule disassembly preferentially in the periphery of individual β-cells, and this process is mediated by the phosphorylation of microtubule-associated protein tau. Here, they state that the sub-membrane microtubule array is destabilized via microtubule sliding. What is the relevance of the different processes? 

      In this comment, the summary of our previous conclusions is correct, but the conclusion of this current study is re-stated incorrectly. Indeed, we have previously shown that in high glucose, MTs are destabilized at the cell periphery and nucleated in the cell interior. However, this current paper does not state that “the sub-membrane microtubule array is destabilized via microtubule sliding”. To answer this reviewer’s question, our data support a model where, during glucose stimulation, MT sliding within the peripheral bundle might move fragments of MTs severed by other mechanisms. Importantly, we propose that MT sliding restores the partially destabilized peripheral bundle by delivery of MTs that are nucleated at the cell interior and incorporating them into that bundle. In our overall model, three processes (destabilization, nucleation, and sliding to restore the bundle) are coordinated to maintain beta cell fitness on each GSIS cycle.

      (2) On one hand the authors describe how KIF5B depletion prevents sliding and the transport of microtubules to the plasma membrane to form the sub-membrane microtubule array. This indicates KIF5B is required to form this structure. On the other hand, they describe that at high glucose concentration, KIF5B promotes microtubule sliding to destabilize the sub-membrane microtubule array to allow robust insulin secretion. This appears contradictory. 

      We never intended to make an impression that MT sliding destabilized the sub-membrane bundle. Apologies if there was a reason in our wording that caused this misunderstanding of our model. We propose that while the bundle is destabilized downstream of glucose signaling (e.g. due to tau phosphorylation, please see Ho et al Diabetes 2020), MT sliding remodels the bundle and thereafter rebuilds it to prevent over-secretion. In the revised manuscript, we have doublechecked the whole text to make sure that such misunderstanding is avoided. 

      (3) Previously, it has been shown that KIF5B induces tubulin incorporation along the microtubule shaft in a concentration-dependent manner. Moreover, running KIF5B increases microtubule rescue frequency and unlimited growth of microtubules. Notably, KIF5B regulates microtubule network mass and organization in cells (PMID: 34883065). Consequently, it appears possible that the here observed phenomena of changes in the microtubule network might be due to alterations in these processes. 

      We thank the reviewer for proposing this alternative explanation to the observed change in microtubule networks after KIF5B depletion. We have now directly tested this possibility. Namely, we have re-expressed the kinesin-1 motor domain in MIN6 cells depleted of KIF5B. This motor domain construct by itself is not capable of driving microtubule sliding because it lacks the tail domain. At the same time, it is known to move very efficiently at microtubules and should provide the effects as reported in the article cited by the reviewer. We found that the reexpression of the kinesin motor domain does not rescue microtubule network defects in beta cells (see new Figure 2 – Supplemental Figure 2). Thus, we conclude that the effects of kinesin depletion on the microtubule network in beta cells are due to the lack of microtubule sliding, as reported here.

      (4) The authors provide data that indicate that microtubule sliding is enhanced upon glucose stimulation. They conclude that these data indicate that microtubule sliding is an integral part of glucose-triggered microtubule remodeling. Yet, the authors fail to provide any evidence that this process plays a role in insulin secretion or glucose uptake. 

      We would like to point out that we do not “fail” but rather choose not to overload our study by repeating insulin secretion assays in KIF5B-inactivated cells because this would not have been very informative. It has been found previously that kinesin-1 inactivation or knockout significantly attenuates insulin secretion because kinesin-1 is actively transporting insulin granules and kinesin-1 activity is enhanced under high glucose conditions (e.g. Varadi et al 2002, Cui et al., 2011, Donelan et al, 2002). That said, our current finding is very much in line with these previous data. When kinesin is depleted, two things would be happening at the same time: in the absence of sub-membrane microtubule bundle pre-existing insulin granules would be over-secreted, and new insulin would not be delivered to the periphery, both decreasing GSIS. Unfortunately, we do not have tools yet that would allow us to dissect which part of the insulin secretion defect is due to prior over-secretion (the consequence of deficient MT sliding) and which part is due to the lack of new granule delivery. We plan to develop such tools in the future and elaborate on them in a follow-up study. Here, our goal is to understand microtubule organization principles in beta cells, and we choose not to extend the scope of the current study to metabolic assays.  

      (5) The authors speculate that the sub-membrane microtubule array prevents the over-secretion of insulin. Would one not expect in this case a change in the distribution of insulin granules at the plasma membrane when this array is affected? Or after glucose stimulation? Notably, it has been reported that "the defects of β-cell function in KIF5B mutant mice were not coupled with observable changes in islet morphology, islet cell composition, or β-cell size" and "the subcellular localization of insulin vesicles was found to not be affected significantly by the decreased Kif5b level. The cytoplasm of both wild-type and mutant β-cells was filled with insulin vesicles. Insulin vesicle numbers per square μm were determined by counting all insulin vesicles in randomly photographed β-cells. More insulin granules were found in Kif5b knockout β-cells compared with control cells. This phenomenon is consistent with the observation that insulin secretion by β-cells is affected" whereby "Insulin vesicles (arrowheads) were distributed evenly in both mutant and control cells" (PMID: 20870970).  

      Quantitative analyses in the study cited by the reviewer do not include assays that would be relevant to our study. Particularly, in that study neither the amount of insulin granules at the cell periphery nor the ratio between the number of granules at the periphery and the beta cell interior has been analyzed. In addition, in our preliminary observations not shown here, insulin content in beta cells in KIF5B KO mice is highly heterogeneous, with a subpopulation of cells severely depleted of insulin. This opens a new avenue of investigation into beta cell heterogeneity, which is out of the scope of this current study. Thus, we chose to restrict this current study to microtubule organization data.   

      (6) Does the sub-membrane microtubule array exist in primary beta cells (in vitro and/or in vivo) and how it is affected in KIF5B knockout mice?  

      Yes, it does exist. In fact, we have first reported it in mouse islets (Bracey et al 2020, Ho et al 2020). Now, we report that the sub-membrane bundle is defective, and microtubules are misaligned in KIF5B KO mice (new Figure 2 – Supplemental Figure 1).

      Reviewer #2 (Public Review): 

      In this article, Bracey et al. provide insights into the factors contributing to the distinct arrangement observed in sub-membrane microtubules (MTs) within mouse β-cells of the pancreas. Specifically, they propose that in clonal mouse pancreatic β-cells (MIN6), the motor protein KIF5B plays a role in sliding existing MTs towards the cell periphery and aligning them with each other along the plasma membrane. Furthermore, similar to other physiological features of β-cells, this process of MTs sliding is enhanced by a high glucose stimulus. Because a precise alignment of MTs beneath the cell membrane in β-cells is crucial for the regulated secretion of pancreatic enzymes and hormones, KIF5B assumes a significant role in pancreatic activity, both in healthy conditions and during diseases. 

      The authors provide evidence in support of their model by demonstrating that the levels of KIF5B mRNA in MIN6 cells are higher compared to other known KIFs. They further show that when KIF5B is genetically silenced using two different shRNAs, the MT sliding becomes less efficient. Additionally, silencing of KIF5A in the same cells leads to a general reorganization of MTs throughout the cell. Specifically, while control cells exhibit a convoluted and non-radial arrangement of MTs near the cell membrane, KIF5B-depleted cells display a sparse and less dense sub-membrane array of MTs. Based on these findings, the Authors conclude that the loss of KIF5B strongly affects the localization of MTs to the periphery of the cell. Using a dominant-negative approach, the authors also demonstrate that KIF5B facilitates the sliding of MTs by binding to cargo MTs through the kinesin-1 tail binding domain. Additionally, they present evidence suggesting that KIF5B-mediated MT sliding is dependent on glucose, similar to the activity levels of kinesin-1, which increase in the presence of glucose. Notably, when the glucose concentrations in the culturing media of MIN6 cells are reduced from 20 mM to 5 mM, a significant decrease in MT sliding is observed. 

      Strengths:

      This study unveils a previously unexplained mechanism that regulates the specific rearrangement of MTs beneath the cell membrane in pancreatic β-cells. The findings of this research have implications and are of significant interest because the precise regulation of the MT array at the secretion zone plays a critical role in controlling pancreatic function in both healthy and diseased states. In general, the author's conclusions are substantiated by the provided data, and the study demonstrates the utilization of state-of-the-art methodologies including quantification techniques, and elegant dominant-negative experiments. 

      Weaknesses:

      A few relatively minor issues are present and related to data interpretation and the conclusions drawn in the study. Namely, some inconsistencies between what appears to be the overall and sub-membrane MT array in scramble vs. KIF5B-depleted cells, the lack of details about the sub-cellular localization of KIF5B in these cells and the physiological significance of the effect of glucose levels in beta-cells of the pancreas. 

      We thank the reviewer for this insighDul review. In the revised version, we provided re-worded and extended interpretations and conclusions to prevent any issues or misunderstandings.  We trust that while some noted apparent inconsistencies may reflect the intrinsic heterogeneity of the beta cell population, all data presented here indicate the same trend in phenotypes.  In the revised version, we have provided additional cell views and, in places, alternative representative images and videos, to clear out any apparent inconsistencies. We also would like to point out that we in fact reported KIF5B localization: not surprisingly, KIF5B predominantly localized to insulin granules and the punctate staining fills the whole cytoplasm (Figure 2A, bottom panel). However, as pointed out in detail in our response to reviewer 1, we choose to leave out an extensive study of the physiological and metabolic consequences of the reported microtubule network dynamics to a follow-up study. 

      Reviewer #3 (Public Review): 

      Prior work from the Kaverina lab and others had determined that beta-cells build a microtubule network that differs from the canonical radial organization typical in most mammalian cell types and that this organization facilitates the regulated secretion of insulin-containing secretory granules (IGs). In this manuscript, the authors tested the hypothesis that kinesin-driven microtubule sliding is an underlying mechanism that establishes a sub-membranous microtubule array that regulates IG secretion. They employed knock-down and dominant-negative strategies to convincingly show microtubule sliding does, in fact, drive the assembly of the sub-membranous microtubule band. They also used live cell imaging assays to demonstrate that kinesin-mediated microtubule sliding in beta-cells is triggered by extracellular high glucose. Overall, this is an interesting and important study that relates microtubule dynamics to an important physiological process. The experiments were rigorous and well-controlled. 

      We truly appreciate this reviewer’ opinion. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Figures: 

      (1) Figure 1: 

      a) Why can one not see here, and in most following images, the peripheral sub-membrane microtubule array? One can also not see an accumulation of microtubules in the cell interior. 

      Microtubule pattern in beta cells is variable, and the sub-membrane array is seen in the whole population to a variable extent (see directionality histogram in Figure 2E for statistics). In fact, an array of peripheral MTs parallel to the cell border is present in the example shown in Figure 1 and in all following control images. To make it clearer, we now show the pre-bleach images in Figure 1 D-F at a lower magnification, so that the differences in MT density at the cell periphery and cell center are more clearly seen: MTs lack at the periphery in KF5B-depleted but not the control cells.  

      b) 5 min appears to be a long time and enough time to polymerize a significant number of new microtubules. 

      We interpret this comment as the reviewer’s concern that in FRAP assays, fluorescently-labeled MTs moving into the bleached area might be newly polymerizing MTs rather than preexisting MT relocated into that area. However, this is not the case because newly polymerized MTs contain predominantly quenched “dark” tubulin molecules and only a small percent of fluorescent tubulin. These dim MTs are not included in MT sliding assay analysis, where a threshold for bright MTs is introduced. Now, we added more details for the quantification of these data to Materials and Methods section.

      c) The overall effects appear minor. It is unclear how Fig. 1-Suppl-Fig.1, where no significant difference is shown, is translated into Figure 1 J and K showing a significant difference. 

      With all due respect, we do not agree that the effect is minor. Please see our response to the Public Review where we discuss the major consequences of MT defects in detail. 

      To answer this specific comment, we show that there are significant differences in the number of rapidly moving MTs (5-sec displacement over 0.3 µm) and in the amount of stationary MTs (5sec displacement is below 0.15 µm). There is no significant difference in the amount of slightly displaced MTs (displacements between 0.15 and 0.3 µm; the central part of the histogram). This might indicate that these slight displacements do not depend on kinesin-1 motor but rather are caused by experimental noise, pushing by moving organelles, and/or myosin-dependent forces in the cell. In the revised manuscript, we have this quantification more clearly detailed in Methods and included in Figure legends.

      d) The authors utilize single molecule tracking to further strengthen their conclusion that KIF5B promotes microtubule sliding. The observed effects are weaker than the data obtained from photobleaching experiments. The videos clearly show that there is still significant movement also in KIF5B-depleted cells. If K560RigorE236A binds irreversibly to a microtubule and this microtubule is growing (not only by the addition of tubulin dimers to the plus end; see PMID: 34883065) wouldn't that also result in movement of the tagged K560RigorE236A? As KIF5B is also required in the transport of insulin granules, it should also label "interior microtubules". And in Video 2 it appears that pretty much all "labeled" microtubules are moving. 

      K560RigorE236A forms fiducial marks along the whole MTs lattice, as previously shown in (Tanenbaum et al., 2014). When it is bound to MT lattice, K560RigorE236A moves with the whole MT if it is being relocated. The mechanism described in (PMID: 34883065) appears to be absent or minor in beta cells (see Figure 2- Supplemental Figure 2), thus, even if this mechanism would displace already polymerized MTs, this is not happening in this cell type.

      The reviewer is correct, K560RigorE236A does mark all MTs throughout a beta cell. All MTs are moving slightly in a living cell because they are pushed around by moving organelles, actin contractility, etc. MTs may also be slid by other MT-dependent motors (dynein against the membrane and such). So, it is not surprising that the MT network is “breezing,” and kinesindependent sliding is only a part of MT movement. What we show here is that the KIF5Bdependent MT sliding is responsible for a relatively “long-distance” relocation of MTs manifested in long, directional displacement of fiducial marks.  This does not exclude other movements. This makes extraction of kinesin-dependent MT movements somewhat challenging, of course, that is why we needed to do those extensive analyses. 

      e) Figure 1 G to K is misleading, at least in the context of the provided videos. There are several microtubules that move extensively in shRNA#2-treated cells and overall there appears more movement in this cell as in the control cell. Figure 1I is clearly not representative of the movement shown in Video 2. 

      We apologize if our selection of representative movies/figures for this experiment was imperfect. Indeed, in all depleted cells, SunTag puncta still move to a certain extent, either due to incomplete depletion or to alternative intracellular forces dislocating microtubules. However, there is a clear difference in the fraction of persistently moving puncta (please see Figure 1K and  histogram in Figure 1 - Supplemental Figure 1B). Unfortunately, when the number of SunTag puncta per a cell is variable, it sometimes prevents a good visual perception of the actual distribution of moving versus stationary microtubules. We now show an alternative representative movie for the Figure 1I and the corresponding Video 2, with a goal to compare cells with more consistent numbers of Sun-Tag puncta.

      (2) Figure 2A. 

      a) This is the only image that clearly shows the existence of a sub-membrane microtubule array and the concentration of microtubules in the cell interior. The differences are unclear between the experimental setups including the length of cultivation and knockdown of KIF5B or expression of mutants. 

      We now provide a more detailed description of each image acquisition and processing in Materials and Methods. In brief, while the morphology of MT patterns is intrinsically variable in beta cells, all control cells have populated peripheral MTs that exhibit a more parallel configuration as compared to depletions and mutants.

      b) The authors state "While control cells had convoluted non-radial MTs with a prominent sub-membrane array, typical for beta cells (Fig. 2A), KIF5B-depleted cells featured extra-dense MTs in the cell center and sparse reseeding MTs at the periphery (Fig. 2B, C)". Could that not be explained with the observation that "Kinesin-1 controls microtubule length" (PMID: 34883065)? 

      Thank you for this interesting alternative idea. It does not appear to be the case for beta cells.

      Please see Figure 2-Supplemental Figure 2  and our response to Public Review Comment #3.

      Also, our apologies for the typo in the original manuscript: this is “receding” nor “reseeding”.

      (3) Figure 3: 

      a) This is an elegant way to determine whether KIF5B is involved in microtubule sliding independent of the fact that the effect appears very small. 

      Thank you!            

      b) The assay depends on ectopic expression of a dominant negative mutant. It appears important to show that KIFDNwt is high enough expressed to indeed block the binding of endogenous KIF5B. The authors need to provide a control for this. Furthermore, authors need to provide evidence that other functions of KIF5B are not impaired such as transport of insulin granules and tubulin incorporation or microtubule stability and length.

      Expression of cargo-binding motor domains routinely causes a dominant-negative effect of their cargo transport. This exact construct has been used for the purpose of dominant-negative action previously (Ravindran et al., 2017). It does prevent the membrane cargo binding of KIF5B (Ravindran et al., 2017), thus the transport of insulin granules is also impaired in overexpression cells. Confirming this fact would not influence our study conclusions, so we chose not to repeat these assays for the sake of time.

      c) N-numbers should be similar. The data for KIFDNmut are difficult to interpret with possibly 2 experiments showing little to no displacement and 3 showing displacement. 

      In the revised manuscript, additional data have been added to increase N-numbers.

      (4) Figure 4 and supplements: The morphology of the KIFDNwt cells is greatly affected and this makes it difficult to say whether the effect on microtubules at the cell periphery is a direct or indirect effect. 

      Yes, these cells often have less spread appearance, obscuring visual perception of MT distribution. We have now replaced the image of KIFDNwt cell (Figure 4, Supplemental Figure 1 A) to a more visually representative example.

      Things to do: 

      (1) Notably, the authors have previously reported that high glucose-induced remodeling of microtubule networks facilitates robust glucose-stimulated insulin secretion. This remodeling involves the disassembly of old microtubules and the nucleation of new microtubules. Here, they state that the sub-membrane microtubule array is destabilized via microtubule sliding. What is the relevance of the different processes? Please discuss these in the manuscript. 

      Thank you, we have now extended our discussion of these points and our prior findings. We have also added a schematic model figure for clarity (Figure 7).  

      (2) 5 min appears to be a long time and enough time to polymerize a significant number of new microtubules. Do the authors have any information about the speed of MT formation in MIN6 cells? Can the authors repeat this experiment by preventing MT polymerization? Or repeat the experiment with EB1/EB3 reporter to visualize microtubule growth in the same experimental setting? 

      While some MT polymerization will happen in this timeframe, newly polymerized MTs contain predominantly quenched “dark” tubulin molecules and only a small percent of fluorescent tubulin. These dim MTs are not included in MT sliding assay analysis, where a threshold for bright MTs is introduced. We apologize for initially omitting certain details from the FRAP assay analysis. Now these details have been added.   

      Are the microtubules shown on the cell surface (TIRF microscopy) or do we see here all microtubules? 

      Please see Materials and Methods for microscopy methods and image processing for each figure. Specifically, FRAP assays show a maximum intensity projection of spinning disk confocal stacks over 2.4µm in height (approximately the ventral half of a cell).

      (3) Previously, it has been shown that KIF5B induces tubulin incorporation along the microtubule shaft in a concentration-dependent manner. Moreover, running KIF5B increases microtubule rescue frequency and unlimited growth of microtubules. Notably, KIF5B regulates microtubule network mass and organization in cells (PMID: 34883065). Consequently, it appears possible that the here observed phenomena of changes in the microtubule network might be due to alterations in these processes. Authors need to exclude these possibilities and discuss them. 

      Thank you for this interesting alternative idea. It does not appear to be the case for beta cells. Please see Figure 2-Supplemental Figure 2  and our response to Public Review Comment #3.

      (4) It is important that the authors describe in the text and possibly in the figure legends the differences between the experimental set-ups including the length of cultivation and knock down of KIF5B or expression of mutants. 

      Thank you, please see these details in the text (Materials and Methods section).

      (5) Figure 5: Does KIF5B depletion rescue the kinesore-induced defects 

      Thank you for suggesting this control. We have now conducted corresponding experiments. The answer is yes, it does. Kinesore does not induce detectable changes in MT patterns in KIF5Bdepleted cells (new Figure 5-Supplemental Figure 2). 

      (6) Can the authors block kinesin-1 resulting in microtubule accumulation in the cell center and then release the block, and best inhibiting microtubule formation, to see whether the microtubules accumulated in the cell center will be transported to the periphery? 

      This proposed experiment would have been a nice illustration to the study, however it has proven to be too challenging. Unfortunately we have to leave it for the future studies. However,  the experiments already included in the paper are sufficient to prove our conclusions. 

      Minor comments: 

      (1) The English needs to be improved. Oaen it is unclear what the authors try to convey. The manuscript is difficult to read and contains several overstatements. 

      The revised manuscript has been through several rounds of proof-reading for clarity.

      (2) It is important to describe in more detail in the introduction what is known about KIF5B in beta cells. Previously, it has been demonstrated that silencing, or inactivation by a dominant negative form of KIF5B, blocks the sustained phase of glucose-stimulated insulin secretion (PMID: 9112396, PMID: 12356920, PMID: 20870970). 

      Yes, this is of course very important and have been cited in the original manuscript. Now, we have expanded the discussion on the matter.

      (3) Figure 1B and Fig. 1 Suppl Fig.1: Please provide band sizes and provide information on the size of KIF5B. 

      We have replaced Fig. 1B and Suppl Fig 1A with quantitative analysis of KIF5B depletion, not found in new Fig. 1B and Suppl Fig. 1A-C. 

      (4) It is important to state the used glucose concentrations in Figure 1D (based on the methods section it is probably 25 mM glucose) and all subsequent experiments. Is this correct and comparable to Figure 6A or B? For the non-specialized reader, more information should be provided on why initial glucose starvation is performed.  

      Cell culture models of pancreatic beta cells are routinely maintained at glucose levels that at considered “high”, or stimulatory for secretion. This is needed to prevent the loss of cells’ capacity to respond to glucose stimulation over generations. In order to test GSIS, cells need to be equilibrated at low (fasting, standardly 2.8mM) glucose levels for several hours, so that they are capable of secreting insulin upon glucose addition. 25mM glucose is normally used to stimulate GSIS in cell culture models of beta cells, like MIN6. This is a higher concentration as compared to what is needed to stimulate primary beta cells in islets.

      Reviewer #2 (Recommendations For The Authors): 

      I have the following specific questions that pertain to data interpretation and the conclusions drawn.

      (1) The morphology of the overall MT array before the bleach treatment in both control cells and KIF5B-KD cells depicted in Figure 1D-F and Figure 2A-C appears to be distinct. In Figure 1, it seems that the absence of KIF5B results in a general augmentation of MT mass, whereas the arrangement presented in Figure 2 indicates the contrary. Even in the sub-membrane areas, this phenomenon appears to hold true. However, the images used in this study, which depict entire cells or a significant portion of cells, may not be ideal for visualizing the sub-membrane regions.

      It would be beneficial if the author could offer some explanations for this apparent inconsistency. 

      While beta cell population is intrinsically heterogeneous, all data presented here indicate the same trend in phenotypes. Possibly, some apparent inconsistency between figure 1 and 2 appeared because in the original manuscript we did not show the pre-bleach whole-cell overview in Figure 1. In the revised version, we now show the whole cells for pre-bleach so that MT organization at the cell periphery can be assessed. Please note that in the control cell, MTs are more or less equally distributed over the cell, while in KIF5B depletions the cell periphery is significantly less populated than the cell center. Furthermore, we did not detect MT mass augmentation or increase in KIF5B depletions. One possible explanation for such reviewer’s impression from Figure 2 is that Figure 2 F-H shows thresholded images where threshold was adjusted to highlight peripheral MTs in each cell. Please note that this is not the same threshold for each cell (see Figure 2 - Supplemental Figure 2 and 3). Thus, KIF5B-depleted cells that have fewer MTs at the periphery appear brighter in these thresholded images. For the true comparison of MT intensity, please see Figure 2 A-C (grayscale image, not the threshold).

      (2) It would be helpful if the author could provide a visual representation or comment on the sub-cellular localization of KIF5B in MIN6 cells. Is it predominantly localized in the submembrane region, or is it more evenly distributed throughout the cytoplasm? 

      Please see Fig 2A, lower panel. KIF5B is seen across the cell as a punctate staining, in agreement with previous findings that it mostly localize at IGs.

      (3) The alteration in microtubule (MT) organization and sliding in the absence of KIF5B seems to initiate in proximity to the apparent microtubule organizing center (MTOC) depicted in Figure 2A, and then "simply" extends towards the sub-membrane region. Although the authors acknowledge it, it would be advantageous for the readers to have a clearer indication that the sub-membrane microtubule (MT) reorganization in the absence of KIF5B is a result of a broader MT reorganization rather than a specific occurrence restricted to the sub-membrane regions. 

      Thank you for this comment. We now extend our discussion to clearer state our conclusions and interpretations of this point. We also have added a schematic Figure 7 as an illustration. 

      (4) Regarding the "glucose experiments," it is common to add 20-25 mM glucose to culture media, but physiological concentrations of glucose typically hover around 5 mM. Therefore, it is somewhat unclear what the implications are when investigating the impact of KIF5B depletion on MT sliding at 2.8 mM of glucose. It would be helpful if the authors could provide some commentary on this matter, particularly in relation to physiological and pathological conditions. 

      2.8 mM glucose is a standard low glucose condition used to model glucose deprivation/fasting. For functional primary beta cells within pancreatic islets, GSIS can be triggered by glucose stimulation as low as 8-12 mM glucose. However, for glucose stimulation of cultured beta cells such as MIN6 used in this paper, 20-25 mM glucose is standardly used because these cell lines have a higher threshold of stimulation compared to primary beta cells and whole islets.

      (5) In supplementary Figure 1A, it would be helpful if the lanes in the WB were marked indicating what is what. In my observation, it appears that Supplementary Figure 1A, particularly lanes #2, 3, and 4, display the GAPDH protein (MW 36 kDa) (or is it alpha-tubulin, as mentioned in the Material and Methods section and indicated in lane #409?) relative to Figure 1A. I am curious about KIF5B (MW 108 kDa). Is it represented by the upper band? Did the author probe the same membrane simultaneously with two different primary antibodies? This should be clarified, and the author should indicate the molecular weight of the ladder. 

      Indeed, in the original WB two antibodies have been used together, due to a challenge in collecting a sufficient number of shRNA-expressing beta cells. It caused a confusion and improper interpretation of the loading control. We thank the reviewer for catching this.  We have now replaced old Fig. 1B and Suppl. Fig. 1A with quantitative analysis of KIF5B depletion based on single-cell immunofluorescent staining. It is now found in new Fig. 1B and Suppl Fig. 1A-C.  

      Reviewer #3 (Recommendations For The Authors): 

      In all of the figures that present microtubule orientations (e.g. Figure 2E) the error bars obscure the vertical bins making them difficult to read or interpret. If they were rendered at a larger scale, it would be easier to read and interpret these results. 

      Thank you pointing this out. We now show these histograms with a different format of error bars and without outliers that obscure the view. A variant with outliers is now shown in the supplement. 

      Some of the callouts to the videos in the paper are inaccurate. Perhaps the authors reordered sections of the paper but failed to correctly renumber the video citations? 

      Thank you for this comment, we have corrected all callouts now.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This short report shows that the transcription factor gene mirror is specifically expressed in the posterior region of the butterfly wing imaginal disk, and uses CRISPR mosaic knock-outs to show it is necessary to specify the morphological features (scales, veins, and surface) of this area.

      Strengths:

      The data and figures support the conclusions. The article is swiftly written and makes an interesting evolutionary comparison to the function of this gene in Drosophila. Based on the data presented, it can now be established that mirror likely has a similar selector function for posterior-wing identity in a plethora of insects.

      We thank the reviewer for their feedback.

      Weaknesses:

      This first version has minor terminological issues regarding the use of the terms "domains" and "compartment".

      We acknowledge that the terminologies “domains” and “compartments” might lead to confusion. To avoid confusion we have removed the term “compartment” from the manuscript.

      Reviewer #2 (Public Review):

      This is a short and unpretentious paper. It is an interesting area and therefore, although much of this area of research was pioneered in flies, extending basic findings to butterflies would be worthwhile. Indeed, there is an intriguing observation but it is technically flawed and these flaws are serious.

      The authors show that mirror is expressed at the back of the wing in butterflies (as in flies). They present some evidence that is required for the proper development of the back of the wing in butterflies (a region dubbed the vannus by the ancient guru Snodgrass). But there are problems with that evidence. First, concerning the method, using CRISP they treat embryos and the expectation is that the mirror gene will be damaged in groups of cell lineages, giving a mosaic animal in which some lines of cells are normal for mirror and others are not. We do not know where the clones or patches of cells that are defective for mirror are because they are not marked. Also, we do not know what part of the wing is wild type and what part is mutant for mirror. When the mirror mutant cells colonise the back of the wing and that butterfly survives (many butterflies fail to develop), the back of the wing is altered in some selected butterflies. This raises a second problem: we do not know whether the rear of the wing is missing or transformed. From the images, the appearance of the back of the wing is clearly different from the wild type, but is that due to transformation or not? And then I believe we need to know specifically what the difference is between the rear of the wing and the main part. What we see is a silvery look at the back that is not present in the main part, is it the structure of the scales? We are not told.

      Thank you for this feedback. We appreciate that many readers may not accustomed to looking at mosaic knockouts. As discussed in a previous review article (Zhang & Reed 2017), we rely on a combination of contralateral asymmetry and replicates to infer mutant phenotypes. For many genes (e.g. pigmentation enzymes) mutant clones are obvious, but for other types of genes (e.g. ligands) clone boundaries are sometimes not directly diagnosable. It is simply a limitation of our study system. Nonetheless, you see for yourself that “the back of the wing is altered in some butterflies” – the effects of deleting mirror are clear and repeatable.

      In terms of interpreting mutant phenotypes, we agree that that paper would benefit from a better description of the specific effects. Therefore, we have included an improved, more systematic description of phenotypes, along with better-annotated figures showing changes in wing shape and venation, scale coloration, and color pattern transformation (e.g. posterior elongation of the orange marginal stripes).

      There are other problems. Mirror is only part of a group of genes in flies and in flies both iroquois and mirror are needed to make the back of the wing, the alula (Kehl et al). What is known about iro expression in butterflies?

      In Drosophila mirror, araucan, and caupolican comprise the so-called Iroqouis Complex of genes. As denoted in Figure S4 and in Kerner et al (doi: https://doi.org/10.1186/1471-2148-9-74) the divergence of araucan and caupolican into two separate paralogs is restricted to Drosophila. As in most insects, butterflies have only two Iroquois Complex genes: araucan and mirror. We tested the role of araucan in Junonia coenia as shown in our pre-print: https://doi.org/10.1101/2023.11.21.568172. Its expression appears to be restricted to early pupal wings where it is transcribed in all scale-forming cells. Mosaic araucan KOs resulted in a change in scale iridescent coloration associated with changes in the laminar thickness of scale cells.  

      In flies, mirror regulates a late and local expression of dpp that seems to be responsible for making the alula. What happens in butterflies? Would a study of the expression of Dpp in wildtype and mirror compromised wings be useful?

      We thank the reviewer for the proposal and agree that a future study comparing Dpp in wild-type versus mirror KO butterflies would be useful to clarify the mechanism of Dpp signalling in wing development. It is not clear, however, that the results of a Dpp experiment would change the conclusions of our current study therefore we decided not to undertake these additional experiments for our revision.

      Thus, I find the paper to be disappointing for a general journal as it does little more than claim what was discovered in Drosophila is at least partly true in butterflies. 

      We respect that the reviewer does not have a strong interest in the comparative aspects of this study. Fair enough. This report is primarily aimed at biologists interested in the evolutionary history of insect wings.

      Also, it fails to explain what the authors mean by "wing domains" and "domain specification". They are not alone, butterfly workers, in general, appear vague about these concepts, their vagueness allowing too much loose thinking.

      A domain is “a region distinctively marked by some physical feature”. This term is used extensively in the developmental biology literature (e.g. “expression domain”, “embryonic domain”, “tissue domain”, “domain specification”) and is found throughout popular textbooks (e.g. Alberts et al. “The Cell”, Gilbert “Developmental Biology”). We prefer the term “domain” because of its association in the Drosophila literature with transcription factors that define fields of cells. We specifically avoided using the term “compartment” because of its association with cell lineage, which we have not tested. 

      Since these matters are at the heart of the purpose and meaning of the work reported here, we readers need a paper containing more critical thought and information. I would like to have a better and more logical introduction and discussion.

      We would like the very same thing, of course, and we hope the reviewer finds our revised manuscript to be more satisfying to read.

      The authors do define what they mean by the vannus of the wing. In flies the definition of compartments is clear and abundantly demonstrated, with gene expression and requirement being limited precisely to sets of cells that display lineage boundaries. It is true that domains of gene expression in flies, for example of the iroquois complex, which includes mirror, can only be related to patterns with difficulty. Some recap of what is known plus the opinion of the authors on how they interpret papers on possible lineage domains in butterflies might also be useful as the reader, is no wiser about what the authors might mean at the end of it!

      We thank the reviewer for this suggestion. However, our experiments have little to contribute to the topic of cell lineage compartmentalization. We have therefore opted to avoid speculating on this topic to prevent confusion and to keep the manuscript focused on our experimental results.

      The references are sometimes inappropriate. The discovery of the AP compartments should not be referred to Guillen et al 1995, but to Morata and Lawrence 1975. Proofreading is required.

      We thank the reviewer for suggesting this important reference. We have included it in our revision.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Chatterjee et al. examines the role of the mirror locus in patterning butterfly wings. The authors examine the pattern of mirror expression in the common buckeye butterfly, Junonia coenia, and then employ CRISPR mutagenesis to generate mosaic butterflies carrying clones of mirror mutant cells. They find that mirror is expressed in a well-defined posterior sector of final-instar wing discs from both hindwings and forewings and that CRISPR-injected larvae display a loss of adult wing structures presumably derived from the mirror expressing region of hindwing primordium (the case for forewings is a bit less clear since the mirror domain is narrower than in the hindwing, but there also do seem to be some anomalies in posterior regions of forewings in adults derived from CRISPR injected larvae). The authors conclude that the wings of these butterflies have at least three different fundamental wing compartments, the mirror domain, a posterior domain defined by engrailed expression, and an anterior domain expressing neither mirror nor engrailed. They speculate that this most posterior compartment has been reduced to a rudiment in Drosophila and thus has not been adequately recognized as such a primary regional specialization.

      Critique:

      This is a very straightforward study and the experimental results presented support the key claims that mirror is expressed in a restricted posterior section of the wing primordium and that mosaic wings from CRISPR-injected larvae display loss of adult wing structures presumably derived from cells expressing mirror (or at least nearby). The major issue I have with this paper is the strong interpretation of these findings that lead the authors to conclude that mirror is acting as a high-level gene akin to engrailed in defining a separate extreme posterior wing compartment. To place this claim in context, it is important in my view to consider what is known about engrailed, for which there is ample evidence to support the claim that this gene does play a very ancestral and conserved function in defining posterior compartments of all body segments (including the wing) across arthropods.

      (1) Engrailed is expressed in a broad posterior domain with a sharp anterior border in all segments of virtually all arthropods examined (broad use of a very good panspecies anti-En antibody makes this case very strong).

      (2) In Drosophila, marked clones of wing cells (generated during larval stages) strictly obey a straight anterior-posterior border indicating that cells in these two domains do not normally intermix, thus, supporting the claim that a clear A/P lineage compartment exists.

      In my opinion, mirror does not seem to be in the same category of regulator as engrailed for the following reasons:

      (1) There is no evidence that I am aware of, either from the current experiments, or others that the mirror expression domain corresponds to a clonal lineage compartment. It is also unclear from the data shown in this study whether engrailed is co-expressed with mirror in the posterior-most cells of J. coenia wing discs. If so, it does not seem justified to infer that mirror acts as an independent determinant of the region of the wing where it is expressed.

      (2) Mirror is not only expressed in a posterior region of the wing in flies but also in the ventral region of the eye. In Drosophila, mirror mutants not only lack the alula (derived approximately from cells where mirror is expressed), but also lack tissue derived from the ventral region of the eye disc (although this ventral tissue loss phenotype may extend beyond the cells expressing mirror).

      In summary, it seems most reasonable to me to think of mirror as a transcription factor that provides important development information for a diverse set of cells in which it can be expressed (posterior wing cells and ventral eye cells) but not that it acts as a high-level regulator as engrailed.

      Recommendation:

      While the data provided in this succinct study are solid and interesting, it is not clear to me that these findings support the major claim that mirror defines an extreme posterior compartment akin to that specified by engrailed. Minimally, the authors should address the points outlined above in their discussion section and greatly tone down their conclusion regarding mirror being a conserved selector-like gene dedicated to establishing posterior-most fates of the wing. They also should cite and discuss the original study in Drosophila describing the mirror expression pattern in the embryo and eye and the corresponding eye phenotype of mirror mutants: McNeill et al., Genes & Dev. 1997. 11: 1073-1082; doi:10.1101/gad.11.8.1073.

      We thank the reviewer for their summary, critique, and recommendations. We agree with everything the reviewer says. Honestly, however, we were surprised by these comments because we took great care in the paper to never refer to mirror as a compartmentalization gene or claim it has a function in cell lineage compartmentalization like engrailed. As pointed out, we lack clonal analyses to test for compartmentalization. This is why we used the term “domain” instead of “compartment” in the title and throughout the manuscript. Nevertheless, we have recrafted the discussion in the manuscript, including completely removing the term “compartment”, to better avoid implications that mirror plays a role in cell lineage compartmentalization. 

      We also thank the reviewer for recommending the paper about the role of mirror in eye development. For the sake of keeping the paper focused, however, we decided not to broach the topic of mirror functions outside the context of wing development.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I have minor comments for improvement.

      The abstract and introductions are terminologically problematic when they refer to the concept of compartment and compartment boundaries. Allegedly this confusion has previously propagated in several articles related to butterfly wing development, which keeps alienating this literature from being taken seriously by fly specialists, for example. So it is important to use the right terms. I will try to explain point by point here, but I would appreciate it if the authors could undertake a significant rewrite taking these comments into account. The authors use the terms compartment and compartment boundary. This has a very specific use in developmental genetics: mitotic clones never cross a boundary (or compartment). I think the authors can keep referring to the equivalent of the A-P boundary, which is situated somewhere between M1-M2 based on unpublished data from the Patel Lab, and is not very well defined (Engrailed expression moves a little bit during development in this area). Domain is a looser term and can be used more liberally to describe genetically defined regions.

      - "Classical morphological work subdivides insect wings into several distinct domains along the antero-posterior (AP) axis, each of which can evolve relatively independently." Yes. This concept of domain and individuation seems important. You could make a proposed link to selector genes here.

      - "There has been little molecular evidence, however, for AP subdivision beyond a single compartment boundary described from Drosophila melanogaster." Incorrect, and this conflates "domain" and "compartment".

      Flies have wing AP domains too, that pattern their veins (see the cited Banerjee et al). 

      - "Our results confirm that insect wings can have more than one posterior developmental domain, and support models of how selector genes may facilitate evolutionarily individuation of distinct AP domains in insect wings". Yes, and I like the second part of the sentence. Still, I would recommend simply deleting "confirm that insect wings can have more than one posterior developmental domain, and" because this is neglecting previous work on AP genetic regionalization in both flies (vein literature) and butterflies (e.g. McKenna and Nijhout, Banerjee et al).

      - "Analyses of wing pattern diversity across butterflies, considering both natural variation and genetic mutants, suggest that wings can be subdivided into at least five AP domains, bounded by the M1, M3, Cu2, and 2A veins respectively, within each of which there are strong correlations in color pattern variation and wing morphology (Figure 1A)". Yes, and I would recommend emphasizing they correspond to welldefined gene expression domains as mentioned in Banerjee et al, or McKenna and Nijhout.

      - "The anterior-most of these domains, bordered by the M1 vein, appears to correspond to an AP compartment boundary originally described by cell lineage tracing in Drosophila melanogaster, and later supported in butterfly wings by expression of the Engrailed transcription factor. Interestingly, however, D. melanogaster work has yet to reveal clear evidence for additional AP domain boundaries in the wing." Confusingly, because the first sentence is about compartments while the second is about AP domains. I also think the claim that Dmel has no other known AP domains is dubious because Spalt is highly regionalized in flies.

      - "Previous authors have proposed the existence of such individuated domains, and speculated that they may be specified by selector genes.5,10 Our data provide experimental support for this model, and now motivate us to identify factors that specify other domain boundaries between the M1 and A2 veins." Yes, I completely agree with this way to emphasize the selector effect, and to link it to the concept of "individuated domain"

      We cannot thank the reviewer enough for the time and thought they devoted to giving helpful suggestions to improve our manuscript. We have applied all of the above recommendations to the revision.

      Fig. S1: the field needs to move away from Red/Green microscopy images, for accessibility reasons.

      The easiest fix here would be to change the red channels to magenta.

      Green/Magenta provides excellent contrast and accessibility in general in 2-channel images.

      We thank the reviewer for this suggestion. We have improved the color accessibility of Fig. S1.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Kv2 subfamily potassium channels contribute to delayed rectifier currents in virtually all mammalian neurons and are encoded by two distinct types of subunits: Kv2 alpha subunits that have the capacity to form homomeric channels (Kv2.1 and Kv2.2), and KvS or silent subunits (Kv5,6,8.9) that can assemble with Kv2.1 or Kv2.2 to form heteromeric channels with novel biophysical properties. Many neurons express both types of subunits and therefore have the capacity to make both homomeric Kv2 channels and heteromeric Kv2/KvS channels. Determining the contributions of each of these channel types to native potassium currents has been very difficult because the differences in biophysical properties are modest and there are no Kv2/KvS-specific pharmacological tools. The authors set out to design a strategy to separate Kv2 and Kv2/KvS currents in native neurons based on their observation that Kv2/KvS channels have little sensitivity to the Kv2 pore blocker RY785 but are blocked by the Kv2 VSD blocker GxTx. They clearly demonstrate that Kv2/KvS currents can be differentiated from Kv2 currents in native neurons using a two-step strategy to first selectively block Kv2 with RY785, and then block both with GxTx. The manuscript is beautifully written; takes a very complex problem and strategy and breaks it down so both channel experts and the broad neuroscience community can understand it.

      Strengths:

      The compounds the authors use are highly selective and unlikely to have significant confounding cross-reactivity to other channel types. The authors provide strong evidence that all Kv2/KvS channels are resistant to RY785. This is a strength of the strategy - it can likely identify Kv2/KvS channels containing any of the 10 mammalian KvS subunits and thus be used as a general reagent on all types of neurons. The limitation then of course is that it can't differentiate the subtypes, but at this stage, the field really just needs to know how much Kv2/KvS channels contribute to native currents and this strategy provides a sound way to do so.

      Weaknesses:

      The authors are very clear about the limitations of their strategy, the most important of which is that they can't differentiate different subunit combinations of Kv2/KvS heteromers. This study is meant to be a start to understanding the roles of Kv2/KvS channels in vivo. As such, this is a minor weakness, far outweighed by the potential of the strategy to move the field through a roadblock that has existed since its inception.

      The study accomplishes exactly what it set out to do: provide a means to determine the relative contributions of homomeric Kv2 and heteromeric Kv2/KvS channels to native delayed rectifier K+ currents in neurons. It also does a fabulous job laying out the case for why this is important to do.

      Reviewer #2 (Public Review):

      Summary:

      Silent Kv subunits and the channels containing these Kv subunits (Kv2/KvS heteromers) are in the process of discovery. It is believed that these channels fine-tune the voltage-activated K+ currents that repolarize the membrane potential during action potentials, with a direct effect on cell excitability, mostly by determining action potentials firing frequency.

      Strengths:

      What makes silent Kv subunits even more important is that, by being expressed in specific tissues and cell types, different silent Kv subunits may have the ability to fine-tune the delayed rectifying voltage-activated K+ currents that are one of the currents that crucially determine cell excitability in these cells. The present manuscript introduces a pharmacological method to dissect the voltage-activated K+ currents mediated by Kv2/KvS heteromers as a means of starting to unveil their importance, together with Kv2-only channels, to the cells where they are expressed.

      Weaknesses:

      While the method is effective in quantifying these currents in any isolated cell under an electric voltage clamp, it is ineffective as a modulating maneuver to perhaps address these currents in an in vivo experimental setting. This is an important point but is not a claim made by the authors.

      We agree. We have now stated in the introduction that this study does not address the roles of Kv2/KvS currents in an in vivo setting.

      Manuscript revisions:

      While this study does not address the impact of GxTX or RY785 on action potentials or in vivo, the distinct pharmacology of Kv2/KvS heteromers presented here suggests that KvS conductances could be targeted to selectively modulate discrete subsets of cell types.  

      There are other caveats with the methods and data:

      (i) The need for a 'cocktail' of blockers to supposedly isolate Kv2 homomers and Kv2/KvS heteromers' currents from others may introduce errors in the quantification Kv2/KvS heteromers-mediated K+ currents and that is due to possible blockers off targets.

      We now point out that is possible that off target effects of blockers may introduce errors, include references that identify the selectivity of the blockers used in the cocktail, and specifically note that 4-aminopyridine in the cocktail is expected to block 2% of Kv2 homomers yet have a lesser impact Kv2/KvS heteromers. Additionally, to test whether the KvS isolation strategy requires the cocktail in neurons, we performed new experiments on a different subclass of nociceptors without the blocker cocktail and identified a substantial KvS-like component (new Fig 7 Supplement 3).

      Manuscript revisions:

      “After whole-cell voltage clamp was established, non-Kv2/KvS conductances were suppressed by changing to an external solution containing a cocktail of inhibitors: 100 nM alpha-dendrotoxin (Alomone) to block Kv1 (Harvey and Robertson, 2004), 3 μM AmmTX3 (Alomone) to block Kv4 (Maffie et al., 2013; Pathak et al., 2016), 100 μM 4-aminopyridine to block Kv3 (Coetzee et al., 1999; Gutman et al., 2005), 1 μM TTX to block TTX sensitive Nav channels, and 10 μM A803467 (Tocris) to block Nav1.8 (Jarvis et al., 2007). It is possible that off target effects of blockers may introduce errors in the quantification Kv2/KvS heteromer-mediated K<sup>+</sup> currents. For example, 4-aminopyridine is expected to block a small fraction, 2%, of Kv2 homomers and have a lesser impact on Kv2/KvS heteromers (Post et al., 1996; Thorneloe and Nelson, 2003; Stas et al., 2015) which could result in a slight overestimation of the ratio of Kv2/KvS heteromers to Kv2 homomers.”

      “We also tested the other major mouse C-fiber nociceptor population, peptidergic nociceptors, to determine if this subpopulation also has conductances resistant to RY785 yet sensitive to GxTX. We voltage clamped DRG neurons from a CGRP<sup>GFP</sup> mouse line that expresses GFP in peptidergic nociceptors (Gong et al., 2003). Deep sequencing has identified mRNA transcripts for Kv6.2, Kv6.3, Kv8.1 and Kv9.3 present in GFP+ neurons, an overlapping but distinct set of KvS subunits from the Mrgprd<sup>GFP</sup> non-peptidergic population (Zheng et al., 2019). In GFP+ neurons from CGRP<sup>GFP</sup> mice, we found that a fraction of outward current was inhibited by 1 µM RY785 and additional current inhibited by 100 nM GxTX (Fig 7 Supplement 3 A-C). In these experiments, 58 ± 2% (mean ± SEM) was KvS-like (Fig 7 Supplement 3 D) identifying that KvSlike conductances are present in these peptidergic nociceptors. For CGRP<sup>GFP</sup> neurons we did not include the Kv1, Kv3, Kv4, Nav and Cav channel inhibitor cocktail used for other neuron experiments, indicating that the cocktail of inhibitors is not required to identify KvS-like conductances.”

      (ii) During the electrophysiology experiments, the authors use a holding potential that is not as negative as it is needed for the recording of the full population of the Kv2/KvS channels. Depolarized holding potentials lead to a certain level of inactivation of the channels, that vary according to the KvS involved/present in that specific population of channels. As a reminder, some KvS promote inactivation and others prevent inactivation. Therefore, the data must be interpreted as such.

      We agree. We now point out that the physiological holding potentials used are insufficiently negative to relieve inactivation from all Kv2/KvS heteromeric channels. We also note that the ratio of Kv2-like to KvS-like conductance is expected to vary with voltage protocols.

      Manuscript revisions:

      “Neurons were held at a membrane potential of –74 mV to mimic a physiological resting potential. KvS subunits can profoundly shift the voltage-inactivation relation (Salinas et al., 1997a; Kramer et al., 1998; Kerschensteiner and Stocker, 1999) and this potential is likely insufficiently negative to relieve inactivation from all Kv2/KvS heteromeric channels. Also, the activation membrane potential is close to the half-maximal point of Kv2/KvS conductances. Thus the ratio of Kv2-like to KvS-like conductance is expected to vary with voltage protocols.”

      (iii) The analysis of conductance activation by using tail currents is only accurate when dealing with non-inactivating conductances. Also, in dealing with a heterogenous population of Kv2/KvS heteromers, heterogenous K+ conductance deactivation kinetics is a must. Indeed, different KvS may significantly relate to different deactivation kinetics as well.

      We now discuss that the bi-exponential fit of tail currents is likely inadequate to capture the deactivation kinetics of all underlying components of a heterogenous population of Kv2/KvS heteromers.

      Manuscript revisions:

      “We note that the analysis of conductance activation by using tail currents is only accurate when dealing with non-inactivating conductances. We expect that inactivation of Kv2/KvS conductances during the 200 ms pre-pulse is minimal (Salinas et al., 1997a; Kramer et al., 1998; Kerschensteiner and Stocker, 1999) and did not notice inactivation during the activation pulse. Also, deactivation kinetics can vary in a heterogenous population of Kv2/KvS heteromers. While analysis of tail currents could skew the quantification of total Kv2 like and KvS-like conductances, our data supports that mouse nociceptors and human neurons have tail currents that are resistant to RY785 and sensitive to GxTX consistent with the presence of Kv2/KvS heteromers.”

      (iv) Silent Kv subunits may be retained in the ER, in heterologous systems like CHO cells. This aspect may subestimate their expression in these systems. Nevertheless, the authors show similar data in CHO cells and in primary neurons.

      We agree. We now note that in heterologous systems, including CHO cells, transfection of KvS subunits can result in KvS subunits that are retained intracellularly.

      Manuscript revisions:

      “While a fraction of KvS subunits appear to be retained intracellularly, immunofluorescence for Kv5.1, Kv9.3 and Kv2.1 also appeared localized to the perimeter of transfected Kv2.1-CHO cells (Figure 1 Supplement).”

      (v) The hallmark of silent Kv subunits is their effect on the time inactivation of K+ currents. As such, data should be shown throughout, preferably, from this perspective, but it was only done so in Figure 4G.

      Indeed, effects on inactivation are a hallmark of KvS subunits. However, quantifying inactivation of Kv2/KvS channels requires steps to positive voltages for approximately 10 seconds. In neurons steps this long usually resulted in irreversible changes in leak currents/input resistance that degraded the accuracy of RY785/GxTX subtraction currents. Consequently, we did not acquire inactivation data in neurons, and we now explain in the manuscript why such data was not obtained.

      Manuscript revisions:

      “While changes in inactivation are prominent with KvS subunits, we did not investigate inactivation in neurons because the lengthy depolarizations required often resulted in irreversible leak current increases that degraded the accuracy of RY785/GxTX subtraction current quantification.”

      (vi) Functional characterization of currents only, as suggested by the authors as a bona fide of Kv2 and Kv2/KvS currents, should not be solely trusted to classify the currents and their channel mediators.

      We agree, and now state explicitly that functional characterization cannot be trusted to classify their channel mediators of conductances, and we try to be clear about this throughout the manuscript by using soft terms such as "KvS-like" when identity is uncertain.

      Manuscript revisions:

      “As functional characterization alone cannot be trusted to classify their channel mediators of conductances, we define conductances consistent with Kv2/KvS heteromers as 'KvS-like' and conductances consistent with Kv2 homomers as 'Kv2-like'.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      There is not a lot to do here - this was a real pleasure to read and very easy to understand, as written. Here are a few minor things to consider:

      (1) The naming of the KvS subunits has always been confusing - it is not clear that Kv5,6,8,9 are members of the Kv2 subfamily from the names. KvS does a good job of differentiating them by assembly phenotype and has been used a lot in the literature, but it doesn't solve the misconception of what subfamily they belong to. This might not matter so much for mammals, where all KvS channels are in the Kv2 subfamily, but it makes it impossible to extend the naming system to other animals where subunits requiring heteromeric assembly are common in most subfamilies. How about trying the name Kv2S? It would have continuity with KvS in the reader's mind, make it clear that they are Kv2 subfamily, and make a naming system that could be extended beyond vertebrates. This is not a problem the authors created - just a completely optional suggestion on how to solve it if so inclined.

      We agree that naming conventions for these subunits are problematic, and agonized quite a bit about nomenclature. In the end we chose to stick with the precedent of KvS.

      (2) Another naming issue they should definitely change is the use of "subfamily" for the different KvS subtypes (Kv5, Kv6, Kv8, and Kv9). This really creates confusion with the higher-order subfamilies that have a very clear functional definition: a subfamily of Kv genes is a group of related genes that have assembly compatibility. Those are Kv1, Kv2, Kv3 and Kv4. KvS genes are assembly compatible with Kv2, evolutionarily derived from the Kv2 lineage, and thus clearly a part of the Kv2 subfamily. Using a subfamily for the next lower level of the naming hierarchy confuses this. The authors should use different terms like sub-type or class or subgroups for the divisions within KvS.

      Thank you. We have standardized to Kv2/KvS as a subfamily; Kv5, Kv6, Kv8, and Kv9 as subtypes; and individual proteins, e.g. Kv8.1, as subunits.

      (3) When you discuss whether the KvS subunit directly disrupts Ry785 binding in the pore or works allosterically and you said you know which KvS residues point into the pore from models, I thought that maybe you could tell from a sequence alignment whether the KvS channels you didn't test look the same in the conduction pathway as the ones you did test. If so, you could mention that if the binding site is the pore, they should all be resistant. Alternatively, if one you didn't test looks fundamentally more similar to the Kv2s in this region, then maybe it could be fingered as a possible exception that needs to be tested later.

      Great ideas. We now assess sequence KvS variability near the proposed RY785 binding site in all KvS subunits. We generated structural models of RY785 docking to Kv2.1 and Kv2.1/Kv8.1 and found that residues near RY785 are different in all KvS subunits.

      Manuscript revisions:

      “We analyzed computational structural models of RY785 docked to a Kv2.1 homomer and a 3:1 Kv2.1:Kv8.1 heteromer (Fig 9) to gain structural insight into how KvS subunits might interfere with RY785 binding. We used Rosetta to dock RY785 to a cryo-EM structure of a Kv2.1 homomer in an apparently open state (Fernández-Mariño et al., 2023). The top-scoring docking pose has RY785 positioned below the selectivity filter and off-axis of the pore (Fig 9 A), similar to a stable pose observed in molecular dynamic simulations (Zhang et al., 2024). In this pose, RY785 contacts a collection of Kv2.1 residues that vary in every KvS subtype (Fig 9 B,D,E). Notably, RY785 bound similarly to a 3:1 model of Kv2.1/Kv8.1, in contact with the three Kv2.1 subunits, yet avoided the Kv8.1 subunit (Fig 9C). This is consistent with RY785 binding less well to Kv2.1/Kv8.1 heteromers, and also suggests that a 3:1 Kv2:KvS channel could retain a RY785 binding site when open.”

      (4) Future suggestion or tip - not for this paper. Your data shows your isolation strategy works really well on Kv6 channels, and these are also the Kv2/KvS channels that have the most pronounced biophysical changes. Working on neurons that have a prominent Kv2/Kv6 component would really show how well the strategy outlined here works to describe the physiology of native neurons. The highest KvS expression I have seen in public data in a wellstudied cell type is Kv6.4 in spinal motor neurons.

      Wonderful tip, thank you. We are indeed very interested in Kv6.4 in spinal motor neurons.

      Reviewer #2 (Recommendations For The Authors):

      The manuscript makes a good contribution to the identification of Kv2/KvS channels in primary cells. The pharmacological method proposed by the authors to dissect the currents in an experimental setting seems proper. Although meritorious in themselves, the findings are heavily phenomenological in the opinion of this reviewer. The manuscript should be improved with some level of mechanistic data and/or the demonstration of different levels of expression in different cell types.

      Thank you for the suggestions. This manuscript now demonstrates strikingly higher levels of the KvS-like component of Kv2 currents in somatosensory (DRG nonpeptidergic and peptidergic nociceptor) versus autonomic (SCG) neuron types. The mechanistic question of what electrophysiological properties the KvS subunits are providing to the neuronal circuit is an exciting one that we are pursuing separately.

      Manuscript revisions:

      “While we found only RY785-sensitive Kv2-like conductances in SCG neurons, Kv2/KvS heteromer-like conductances were dominant in DRG neurons.”

      At present, the manuscript says that the combination of RY785 and guangxitoxin-1E can be used to define Kv2/KvS-mediated K+ currents. Importantly, this method cannot be used in a way that one can functionally determine the function of Kv2/KvS channels, since it depends on the pre-blocking of Kv2-mediated K+ currents prior. In the opinion of this reviewer, this fact decreases the attention of a potential reader.

      Indeed, our study is focused on revealing KvS heteromers by voltage clamp, and we now clarify in the introduction that we do not determine the function of Kv2/KvS channels in this study, so as not to lead the reader to expect studies of neuronal signaling.

      However, the selective pharmacology we identify suggests RY785 application could reveal the function of Kv2 homomers, and for RY785-insensitive signaling, GxTX application of could reveal the function of Kv2/KvS heteromers. We now mention these possible applications in the Discussion.

      Manuscript revisions:

      “While this study does not address the impact of GxTX or RY785 on action potentials or in vivo, the distinct pharmacology of Kv2/KvS heteromers presented here suggests that KvS conductances could be targeted to selectively modulate discrete subsets of cell types.”

      Please find below suggestions for improving the manuscript:

      (1) The term "Kv2/KvS heteromers" should be used throughout instead of variations such as "Kv2/KvS channels", "Kv2/KvS" and others. Standardization of the term to refer to heteromers would make the manuscript easier to read.

      Thank you. We have standardized terms to consistently refer to Kv2/KvS heteromers.

      (2) Confusing terms like KvS conductances, KvS-like conductances, KvS-like (RY785-resistant, GxTX-sensitive) currents, and KvS channels should be avoided because they disregard the current belief that KvS cannot form functional homomeric channels. The term KvS-containing channels, and Kv2/KvS channels, seem more accurate. Uniformization in this regard will also make the manuscript more easily readable.

      Thank you. We have standardized terms to Kv2/KvS heteromers and KvS-containing channels when channel subunits are known and the use terms KvS-like and Kv2-like for functionally identified endogenous conductances with unknown channel subunits.

      (3) Referring to KvS as a regulatory subunit is inaccurate. It is clear that KvS is part of, and it makes up the alpha pore. KvS therefore is a part of the conductive pathway and not a regulatory (suggesting accessory) subunit. KvS take part in selectivity filter (fully conserved), but they also make up an important part of the conducting pathway with non-conserved amino acid residues.

      We felt it important to include the descriptor “regulatory” to connect our nomenclature with prior use of the descriptor in the literature, and now only use the term at the start of the introduction.

      Manuscript revisions:

      “A potential source of molecular diversity for Kv2 channels are a group of Kv2-related proteins which have been referred to as regulatory, silent, or KvS subunits.”

      (4) The use of a cocktail of channel inhibitors may affect the quantification of Kv2/KvS heteromers-mediated K+ currents because they may interact with RY785 and/or GxTx or they may even interact with the sites for these two drugs on Kv2-containing channels.

      This is an interesting point worth considering, thank you. We now alert readers to this possibility in the discussion when considering the limitations of our approach.

      Manuscript revisions:

      “Also, the cocktail of inhibitors used in most neuron experiments here could potentially alter RY785 or GxTX action against KvS/Kv2 channels.”

      (5) The graphical representation of fractional blocking and other parameters (e.g., Fig 1D), is hard to read in these slim plots. In my opinion, tall bars would be more meaningfully visualized.

      Thank you for pointing out that the graphs were hard to read, we have made the graph easier to read and added tall bars.

      (6) Vehicle control for IHC and electrophysiology. Please state what is the vehicle used in the electrophysiology experiments.

      Thank you. The composition of vehicle has now been stated in the methods.

      Manuscript revisions:

      “All RY785 solutions contained 0.1% DMSO. Vehicle control solutions also contained 0.1% DMSO but lacked RY785.”

      “Sections were incubated in vehicle solution (4% milk, 0.2% triton diluted in PB) for 1 hr at RT.”

      (7) The reference Trapani & Korn, 2003 (?) is not included in the list. This reference is important since it sets what are the Kv2.1-CHO cells. In this regard it is also important to mention, even better to address, the expressing qualities of this system in the face of a co-expression with a plasmid-based expression of silent Kv subunits. Are these two ways of expressing Kv subunits, meant to come together (or not) in heteromers, balanced? This question is critical here. Still, in regard to Kv2.1-CHO cells, it was not clear in the manuscript if the term "transfection" refers only to the plasmids used to temporarily induce the expression of silent Kv subunits and potentially Kv channels accessory subunits.

      We now include the Trapani & Korn, 2003 reference (thank you for pointing out this accidental omission), and better explain expression methods. The benefit of the inducible Kv2.1 expression is control of Kv conductance densities which can otherwise become so large as to be refractory to voltage clamp. The beauty of the expression system is that cells recently transfected with KvS subunits can be induced to express just enough Kv2.1 to get a substantial but not clampoverwhelming RY785-resistant Kv2/KvS conductance. We also discuss that our expression methods are distinct from past studies. We stop short of comparing the expression systems, as this is beyond the scope of what we set out to study.

      Manuscript revisions: See next response

      (8) Kv2.1-CHO cells transfection procedures, induction, and validation are unclear. This validation is important here.

      We have clarified transfection procedures, induction, and validation in the methods section.

      Manuscript revisions:

      “The CHO-K1 cell line transfected with a tetracycline-inducible rat Kv2.1 construct (Kv2.1-CHO) (Trapani and Korn, 2003) was cultured as described previously (Tilley et al., 2014).”

      Transfections were achieved with Lipofectamine 3000 (Life Technologies, L3000001). 1 μl Lipofectamine was diluted, mixed, and incubated in 25 μl of Opti-MEM (Gibco, 31985062).”

      “Concurrently, 0.5 μg of KvS or AMIGO1 or Navβ2, 0.5 μg of pEGFP, 2 μl of P3000 reagent and 25 μl of Opti-MEM were mixed. DNA and Lipofectamine 3000 mixtures were mixed and incubated at room temperature for 15 min. This transfection cocktail was added to 1 ml of culture media in a 24 well cell culture dish containing Kv2.1-CHO cells and incubated at 37 °C in 5% CO2 for 6 h before the media was replaced. Immediately after media was replaced, Kv2.1 expression was induced in Kv2.1-CHO cells with 1 μg/ml minocycline (Enzo Life Sciences, ALX380-109-M050), prepared in 70% ethanol at 2 mg/ml. Voltage clamp recordings were performed 12-24 hours later. We note that the expression method of Kv2/KvS heteromers used here is distinct from previous studies which show that the KvS:Kv2 mRNA ratio can affect the expression of functional Kv2/KvS heteromers (Salinas et al., 1997b; Pisupati et al., 2018). We validated the functional Kv2/KvS heteromer expression using voltage clamp to establish distinct channel kinetics and the presence of RY785-resistant conductance in KvS-transfected cells and using immunohistochemistry to label apparent surface localization of KvS subunits (Figure 4, Figure 1 Supplement, Figure 1 and Figure 5).”

      (9) It is important for readers to add some context to Kv2.1/Kv8.1 channels (and other Kv2/KvS heteromers) used to test the combination of RY785 and GxTx. In my opinion, this enriches the discussion.

      Good idea. We have added context about each of the KvS subunits we test.

      Manuscript revisions:

      “To test the pharmacological response of KvS we began with Kv8.1, a subunit that creates heteromers with biophysical properties distinct from Kv2 homomers (Salinas et al., 1997a), and modulates motor neuron vulnerability to cell death (Huang et al., 2024).

      Each of these KvS subunits create Kv2/KvS heteromers that have distinct biophysical properties (Kramer et al., 1998; Kerschensteiner and Stocker, 1999; Bocksteins et al., 2012). Kv5.1/Kv2.1 heteromers play an important role in controlling the excitability of mouse urinary bladder smooth muscle (Malysz and Petkov, 2020), mutations in Kv6.4 have been shown to influence human labor pain (Lee et al., 2020b), and deficiency of Kv9.3 disrupts parvalbumin interneuron physiology in mouse prefrontal cortex (Miyamae et al., 2021).”

      (10) In general, the membrane potential used to activate Kv2 only channels and Kv2/KvS channels is too close to the activation V1/2. In case the comparing curves are displaced in their relative voltage dependence and voltage sensitivity, using that range of membrane potential may introduce a crucial error in the estimation of the conductance's relative amplitudes.

      We now note that the relative conductances of Kv2-only vs Kv2/KvS channels are expected to vary with voltage protocol, as KvS inclusion results in channels with altered voltage responses.

      Manuscript revisions:

      “…the activation membrane potential is close to the half-maximal point of Kv2/KvS conductances. Thus the ratio of Kv2-like to KvS-like conductance is expected to vary with voltage protocols.”

      (11) The use of tail currents to estimate conductance is problematic if i) lack of current inactivation is not assured, and ii) if the different currents, with possible different deactivation kinetics at the used membrane potential (e.g., mV), are not assured. Why was the activation peak used at times, and at different elapsed times the tail currents were used instead? These aspects of conductance's amplitude estimation methods should be well defined.

      In CHO cells peak currents were analyzed because outward currents seem to offer the best signal/noise. In neurons, we restricted analysis to tail currents at elapsed times to minimize complications from non-Kv2 endogenous voltage-gated channels which deactivate more quickly. We have clarified this analysis in the methods section.

      Manuscript revisions:

      “In CHO cells peak currents were analyzed because outward currents seem to offer the best signal/noise. In neurons, we restricted analysis to tail currents at elapsed times to minimize complications from non-Kv2 endogenous voltage-gated channels which deactivate more quickly. In neurons, voltage gated currents remained in the toxin cocktail + RY785 and GxTX, that were sometimes unstable. To minimize complications from these currents, we restricted analysis of RY785 and GxTX subtraction experiments to tail currents at elapsed times to minimize complications from non-Kv2 endogenous voltage-gated channels which deactivate more quickly. We note that the analysis of conductance activation by using tail currents is only accurate when dealing with non-inactivating conductances. We expect that inactivation of Kv2/KvS conductances during the 200 ms pre-pulse is minimal (Salinas et al., 1997a; Kramer et al., 1998; Kerschensteiner and Stocker, 1999) and did not notice inactivation during the activation pulse. Also, deactivation kinetics can vary in a heterogenous population of Kv2/KvS heteromers. While analysis of tail currents could skew the quantification of total Kv2 like and KvS-like conductances, our data supports that mouse nociceptors and human neurons have tail currents that are resistant to RY785 and sensitive to GxTX consistent with the presence of Kv2/KvS heteromers.”

      (12) Were the experiments including different conditions such as control, RY, and RY+GxTx done pair-wised? This could potentially better the statistics and strengthen the data and the conclusions drawn from them.

      The control, RY, and RY+GxTX in neurons were done pairwise and the statistical tests performed for these experiments were pairwise tests. We have clarified this in the figure legends.

      Manuscript revisions:

      “Wilcoxon rank tests were paired, except the comparison of RY785 to vehicle which was unpaired.”

      (13) The holding potential of the experiments, mostly -89 mV, may be biasing the estimation of Kv2 only channels vs. Kv2/KvS channels conductances. Figure 4I exemplifies this concern.

      We agree. Figure 4I reveals that a holding potential of -89 mV vs -129 mV reduces conductance of Kv2.1/Kv8.1 heteromers vs Kv2.1 homomers in CHO cells by ~20%. We have now alerted readers that the ratio of Kv2 only channels vs. Kv2/KvS conductances can vary with holding voltage.

      Manuscript revisions:

      “Under these conditions, 58 ± 3 % (mean ± SEM) of the delayed rectifier conductance was resistant to RY785 yet sensitive to GxTX (KvS-like) (Fig 7 F). We note that the ratio of KvS- to Kv2-like conductances is expected to vary with holding potential, as KvS subunits can change the degree and voltage-dependence of steady state inactivation (e.g. Fig 4I).”

      (14) It is possible that Figure 6A (control trace) and Figure 6C ("Kv2-like" trace) are the same, by mistake, since their noise pattern looks too similar.

      Indeed the noise pattern of the Figure 6A (control trace) and Figure 6C ("Kv2-like" trace) are related, as they have inputs from the same trace, with Figure 6C ("Kv2-like" trace) being a subtraction of Figure 6A (+RY trace) from Figure 6A (control trace).

      (15) For example, in Figure 7A, what is the identity of the current remaining after the RY+GxTx application? In Figure 7B, a supposed outlier in the group of data referring to "veh" in the right panel is what possibly is making this group different from +RY in the left panel (p=0.02, Wilcoxon rank test). I would recommend parametric tests only since the data is essentially quantitative.

      In Figure 7A, we do not know the identity of the current remaining after the RY+GxTX application, the kinetics of the residual current appeared distinct from the Kv2/KvS-like currents blocked by RY or GxTX, but we did not analyze these.

      The date in Figure 7B, was indeed the positive outlier in the group of data referring to "veh" in the right panel and contributes to the p-value, but we saw no reason to exclude it. We have now replaced the representative trace in 7B with a non-outlier trace. We respectfully disagree with the suggestion to use parametric statistical tests as we do not know the distribution underlying the variance our data.

      Manuscript revisions:

      “Subsequent application of 100 nM GxTX decreased tail currents by 68 ± 5% (mean ± SEM) of their original amplitude before RY785. We do not know the identity of the outward current that remains in the cocktail of inhibitors + RY785 + GxTX.”

      (16) Please state the importance of using nonpeptidergic neurons to study silent Kv5.1 and Kv9.1 subunits. RNA data may not necessarily work to probe function or protein abundance, which is crucial in heteromeric complexes.

      We have now more thoroughly explained our rationale for choosing the nonpeptidergic neurons.

      RNA is not predictive of protein abundance, and we have not yet been successful in measuring KvS protein abundance in these neurons, so we've probed KvS abundance by assessing RY785 resistance.

      Manuscript revisions:

      “Mouse dorsal root ganglion (DRG) somatosensory neurons express Kv2 proteins (Stewart et al., 2024), have GxTX-sensitive conductances (Zheng et al., 2019), and express a variety of KvS transcripts (Bocksteins et al., 2009; Zheng et al., 2019), yet transcript abundance does not necessarily correlate with functional protein abundance. To record from a consistent subpopulation of mouse somatosensory neurons which has been shown to contain GxTXsensitive currents and have abundant expression of KvS mRNA transcripts (Zheng et al., 2019), we used a Mrgprd<sup>GFP</sup> transgenic mouse line which expresses GFP in nonpeptidergic nociceptors (Zylka et al., 2005; Zheng et al., 2019). Deep sequencing identified that mRNA transcripts for Kv5.1, Kv6.2, Kv6.3, and Kv9.1 are present in GFP+ neurons of this mouse line (Zheng et al., 2019) and we confirmed the presence of Kv5.1 and Kv9.1 transcripts in GFP+ neurons from Mrgprd<sup>GFP</sup> mice using RNAscope (Fig 7 Supplement 1).”

      (17) In Figure 8B, were +RY data different from veh data? The figure shows no Wilcoxon (nonparametric) comparison and this is important to be stated. What conductance(s) is the vehicle solution blocking or promoting? What is RY dissolved in, DMSO? What is the DMSO final concentration?

      We now state that in Figure 8B, +RY amplitudes were not statistically different from veh data in this limited data set. However, the RY-subtraction currents always had Kv2-like biophysical properties, whereas vehicle-subtraction currents had variable properties precluding biophysical analysis for Fig 8D.

      In Figure 8B, we do not know what conductance(s) the vehicle solution is affecting, we think the changes observed are likely merely time dependent or due to the solution exchange itself. RY stock is in DMSO. All recording solutions have 0.1% DMSO final concentration, this is now noted in methods.

      Manuscript revisions:

      “Unlike mouse neurons, we did not detect a significant difference in tail currents of RY785 versus vehicle controls. However, RY785-subtracted currents always had Kv2-like biophysical properties whereas vehicle-subtraction currents had variable properties that precluded the same biophysical analysis. Overall, these results show that human DRG neurons can produce endogenous voltage-gated currents with pharmacology and gating consistent with Kv2/KvS heteromeric channels.”

      “All RY785 solutions contained 0.1% DMSO. Vehicle control solutions also contained 0.1% DMSO but lacked RY785.”

      (18) METHODS. The electrophysiology approach should be unified in all aspects as applicable and possible.

      We have unified the mouse dorsal root ganglion and mouse superior cervical ganglion methods sections. We have kept CHO cells and mouse/human neurons section separate because the methods were substantially different.

      (19) DISCUSSION. The discussion section spends half of its space trying to elaborate on possible blocking/inhibiting/modulating mechanisms for RY785. The present manuscript shows no data, at least not that I have noticed, that would evoke such discussion.

      We have shortened this section, and enhance the discussion with structural models (new Fig 9), and our functional data indicating perturbed RY785 interaction with Kv2.1/8.1.

      Manuscript revisions:

      “In this pose, RY785 contacts a collection of Kv2.1 residues that vary in every KvS subtype (Fig 9 B,D,E). Notably, RY785 bound similarly to a 3:1 model of Kv2.1/Kv8.1, in contact with the three Kv2.1 subunits, yet avoided the Kv8.1 subunit (Fig 9C). This is consistent with RY785 binding less well to Kv2.1/Kv8.1 heteromers, and also suggests that a 3:1 Kv2:KvS channel could retain a RY785 binding site when open. However, the RY785 resistance of Kv2/KvS heteromers may primarily arise from perturbed interactions with the constricted central cavity of closed channels. In homomeric Kv2.1, RY785 becomes trapped in closed channels and prevents their voltage sensors from fully activating, indicating that RY785 must interact differently with closed channels (Marquis and Sack, 2022). Here we found that Kv2.1/Kv8.1 current rapidly recovers following washout of RY785, suggesting that Kv2.1/Kv8.1 heteromers do not readily trap RY785 (Figure 2 Supplement). Overall, the structural modeling suggests that KvS subunits sterically interfere with RY785 binding to the central cavity, while functional data suggest KvS subunits disrupt RY785 trapping in closed states.”

      (20) DISCUSSION. Topics like ER retention and release upon certain conditions would be a better enrichment for the manuscript in my opinion.

      ER retention of KvS subunits is indeed an important topic! However, we have opted not to delve into it here.

      (21) DISCUSSION. Speculation about the binding site for RY on Kv2/KvS channels is also not touched by the data shown in the manuscript.

      We have shortened this section of discussion, and now present this with structural models of RY785 docked to a Kv2.1 homomer and 3:1 Kv2.1: Kv8.1 heteromer (new Fig 9) to ground speculations. See manuscript changes noted in response to comment (19) above.

      (22) DISCUSSION. An important reference is missing in regard to stoichiometry: Bocksteins et al., 2017. This work is the only one using a non-optical technique to add knowledge to that question.

      Good point, and an excellent study we didn’t realize we’d not included before. We now include Bocksteins et al. 2017 as a reference in the Introduction.

      (23) In my opinion, allosterism and orthosterism are concepts not yet useful for the discussion of RY binding sites without even a general piece of data.

      We now include structural models of RY785 docked to a Kv2.1 homomer and 3:1 Kv2.1: Kv8.1 heteromer (new Fig 9) to ground blocking speculations. See manuscript changes noted in response to comment (19).

      (24) The term "homogeneously susceptible" associated with a Hill slope close to 1 needs to be more elaborated.

      Thank you, we have elaborated.

      Manuscript revisions:

      “Also, the degree of resistance to RY785 may vary if Kv2:KvS subunit stoichiometry varies. With high doses of RY785, we found that the concentration-response characteristics of Kv2.1/Kv8.1 in CHO cells revealed hallmarks of a homogenous channel population with a Hill slope close to 1 (Fig 2B). However, other KvS subunits might assemble in multiple stoichiometries and result in pharmacologically-distinct heteromer populations.”

      (25) Stating the KvS are resistant to RY785 is not proper in my opinion. This opinion relates to the fact that the RY binding site in the channels is certainly not restricted to a binding site residing only on the Kv subunit.

      Good point. We have now changed phrasing to convey that KvS subunits are a component of a heteromer that imbues RY785 resistance.

      Manuscript revisions:

      “These results show that voltage-gated outward currents in cells transfected with members from each KvS subtype have decreased sensitivity to RY785 but remain sensitive to GxTX. While we did not test every KvS subunit, the ubiquitous resistance suggests that all KvS subunits may provide resistance to 1 μM RY785 yet remain sensitive to GxTX, and that RY785 resistance is a hallmark of KvS-containing channels.”

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors investigate the role of the melanocortin system in puberty onset. They conclude that POMC neurons within the arcuate nucleus of the hypothalamus provide important but differing input to kisspeptin neurons in the arcuate or rostral hypothalamus.

      Strengths:

      Innovative and novel

      Technically sound

      Well-designed

      Thorough

      Weaknesses:

      There were no major weaknesses identified.

      Reviewer #2 (Public review):

      Summary:

      This interesting manuscript describes a study investigating the role of MC4R signalling on kisspeptin neurons. The initial question is a good one. Infertility associated with MC4 mutations in humans has typically been ascribed to the consequent obesity and impaired metabolic regulation. Whether there is a direct role for MC4 in regulating the HPG axis has not been thoroughly examined. Here, the researchers have assembled an elegant combination of targetted loss of function and gain of function in vivo experiments, specifically targetting MC4 expression in kisspeptin neurons. This excellent experimental design should provide compelling evidence for whether melanocortin signalling dirently affects arcuate kisspeptin neurons to support normal reproductive function. There were definite effects on reproductive function (irregular estrous cycle, reduced magnitude of LH surge induced by exogenous estradiol). However, the magnitude of these responses and the overall effect on fertility were relatively minor. The mice lacking MC4R in kisspeptin neurons remained fertile despite these irregularities. The second part of the manuscript describes a series of electrophysiological studies evaluating the pharmacological effects of melanocortin signalling in kisspeptin cells in ex-vivo brain slides. These studies characterised interesting differential actions of melanocortins in two different populations of kisspeptin neurons. Collectively, the study provides some novel insights into how direct actions of melanocortin signalling via the MC4 receptor in kisspeptin neurons contribute to the metabolic regulation of the reproductive system. Importantly, however, it is clear that other mechanisms are also at play.

      Strengths:

      The loss of function/gain of function experiments provides a conceptually simple but hugely informative experimental design. This is the key strength of the current paper - especially the knock-in study that showed improved reproductive function even in the presence of ongoing obesity. This is a very convincing result that documents that reproductive deficits in MC4R knockout animals (and humans with deleterious MC4R gene variants) can be ascribed to impaired signalling in the hypothalamic kisspeptin neurons and not necessarily caused as a consequence of obesity. As concluded by the authors: "reproductive impairments observed in MC4R deficient mice, which replicate many of the conditions described in humans, are largely mediated by the direct action of melanocortins via MC4R on Kiss1 neurons and not to their obese phenotype." This is important, as it might change how such fertility problems are treated.

      I would like to see the validation experiments for the genetic manipulation studies given greater prominence in the manuscript because they are critical to interpretation. Presently, only single unquantified images are shown, and a much more comprehensive analysis should be provided.

      Weaknesses:

      (1) Given that mice lacking MC4R in kisspeptin neurons remained fertile despite some reproductive irregularities, this can be described as a contributing pathway, but other mechanisms must also be involved in conveying metabolic information to the reproductive system. This is now appropriately covered in the discussion.

      (2) The mechanistic studies evaluating melanocortin signalling in kisspeptin neurons were all completed in ovariectomised animals (with and without exogenous hormones) that do not experience cyclical hormone changes. Such cyclical changes are fundamental to how these neurons function in vivo and may dynamically alter how they respond to hormones and neuropeptides. Eliminating this variable makes interpretation difficult, but the authors have justified this as a reductionist approach to evaluate estradiol actions specifically. However, this does not reflect the actual complexity of reproductive function.

      For example, the authors focus on a reduced LH response to exogenous estradiol in ovariectomised mice as evidence that there might be a sub-optimal preovulatory LH surge. However, the preovulatory LH sure (in intact animals) was not measured.

      They have not assessed why some follicles ovulated, but most did not. They have focused on the possibility that the ovulation signal (LH surge) was insufficient rather than asking why some follicles responded and others did not. This suggests some issue with follicular development, likely due to changes in gonadotropin secretion during the cycle and not simply due to an insufficient LH surge.

      Reviewer #3 (Public review):

      The manuscript by Talbi R et al. generated transgenic mice to assess the reproduction function of MC4R in Kiss1 neurons in vivo and used electrophysiology to test how MC4R activation regulated Kiss1 neuronal firing in ARH and AVPV/PeN. This timely study is highly significant in neuroendocrinology research for the following reasons.

      (1) The authors' findings are significant in the field of reproductive research. Despite the known presence of MC4R signaling in Kiss1 neurons, the exact mechanisms of how MC4R signaling regulates different Kiss1 neuronal populations in the context of sex hormone fluctuations are not entirely understood. The authors reported that knocking out Mc4r from Kiss1 neurons replicates the reproductive impairment of MC4RKO mice, and Mc4r expression in Kiss1 neurons in the MC4R null background partially restored the reproductive impairment. MC4R activation excites Kiss1 ARH neurons and inhibits Kiss1 AVPV/PeN neurons (except for elevated estradiol).

      (2) Reproduction dysfunction is one of obesity comorbidities. MC4R loss-of-function mutations cause obesity phenotype and impaired reproduction. However, it is hard to determine the causality. The authors carefully measured the body weight of the different mouse models (Figure 1C, Figure 2A, Figure 3B). For example, the Kiss1-MC4RKO females showed no body weight difference at puberty onset. This clearly demonstrated the direct function of MC4R signaling in reproduction but was not a consequence of excessive adiposity.

      (3) Gene expression findings in the "KNDy" system align with the reproduction phenotype.

      (4) The electrophysiology results reported in this manuscript are innovative and provide more details of MC4R activation and Kiss1 neuronal activation.

      Overall, the authors have presented sufficient background in a clear, logical, and organized structure, clearly stated the key question to be addressed, used the appropriate methodology, produced significant and innovative main findings, and made a justified conclusion.

      Comments on revisions:

      The authors have addressed my comments.

      Recommendations for the authors:

      The reviewers noted that they received comments in response to their concerns, and some improvements have been made to the manuscript. However, as described below, in some cases, a rebuttal was provided, but changes were not made to the manuscript. It is suggested that these issues be addressed to improve the quality of the manuscript.

      We thank the reviewers and editor for the assessment of the manuscript and recommendations for its improvement. We have addressed the remaining comments from reviewer #2 below, and hope that they find our revisions satisfactory.

      Reviewer #2 (Recommendations for the authors):

      The manuscript convincingly shows that MC4R in kisspeptin-producing cells can influence reproductive function. This suggests that fertility problems associated with melanocortin mutations are likely due to direct effects on the reproductive systems rather than simply being side effects of the resultant obesity.

      We are pleased that this reviewer finds the data convincing and thank them for the careful review of the manuscript, which has helped to improve its published version.

      The authors have responded to the reviewer's comments and made several improvements to the manuscript.

      The authors are correct in pointing out that the POMC-Cre animals should be fine for studies involving the administration of AAVs to adult animals. I have misinterpreted how these mice were being used, and this concern is fully addressed.

      Unfortunately, in some cases, the authors rebutted the reviewer's comments but did not change the manuscript. I suggest addressing several issues in the manuscript (after all, it is not the reviewer's opinion that counts; this process is about improving the manuscript).

      (1) Validation of the KO is insufficiently reported. From the methods, it appears that this was done thoroughly, but currently, only a single image of the arcuate nucleus is shown, and no image of the AVPV is shown. There is no quantitative information provided. The authors can keep these data as supplementary material, but they should be comprehensive and convincing, as so much depends on the degree of knockout in this model. One cannot assume complete KO based simply on the relevant genetics, as there are examples in this system where different Cre lines produce different outcomes with various floxed genes in the two major populations of kisspeptin neurons. This figure should show the quantitation of the RNAscope analysis from each of the two regions regarding the percentage of kisspeptin cells showing expression of MC4R mRNA. In addition, the lack of MC4 labelling in the arcuate nucleus, outside of kisspeptin neurons, is a concern. One would expect to see AgRP or POMC cells at this level, but are they still showing expression of MC4? A single image is insufficient to be convinced of the model's efficacy.

      We appreciate the reviewer’s concerns regarding the validation of the MC4RKO model. Below, we provide clarification and additional justification for our approach.

      (1) Quantification of MC4R in the Arcuate Nucleus (ARC): As noted by the reviewer, we were unable to detect sufficient MC4R signal in the ARC of KO mice to perform meaningful quantification. This is consistent with the expected outcome of a successful MC4R deletion. Given the low endogenous expression levels of MC4R in this region, even in control animals, and the technical limitations of RNAscope in detecting very low-abundance transcripts, especially for receptors, the absence of MC4R signal in the ARC of KO mice strongly supports effective deletion. Moreover, the MC4R loxP mouse has been published and validated by many labs including Brad Lowell’s lab who’s done extensive work using these mice for selective deletion of Mc4r from various neuronal populations such as Sim1 and Vglut2 neurons (Shah et al., 2014, de Souza Cordeiro et al., 2020). To further strengthen our validation, we provide additional images from another animal (Fig_S1) to illustrate the consistency of the MC4R KO in the ARC. These will be included as supplementary material, as suggested.Regarding AgRP and POMC neurons, MC4R is not highly expressed in these neurons (as per previous literature, e.g., Garfield et al., Nat Neurosci. 2015; Padilla SL et al, Endocrinology 2012; Henry et al, Nature, 2015). Instead, MC4R is predominantly found in downstream neurons in the paraventricular nucleus (PVN) and other hypothalamic regions (which is intact in our KO mice as shown in our validation figure). Thus, the absence of MC4R labeling in AgRP or POMC cells in our images aligns with known expression patterns and does not contradict the validity of our model.

      (2) MC4R Expression in the AVPV and OVX Effect on Kiss1 Expression: We acknowledge the reviewer’s request for MC4R expression analysis in the anteroventral periventricular nucleus (AVPV). However, due to the timing of tissue collection after ovariectomy (OVX), Kiss1 expression in the AVPV is significantly suppressed, making it technically unfeasible to perform co-staining of MC4R with Kiss1 in this region. This is a well-documented effect of estrogen depletion following OVX (Smith et al., 2005; Lehman et al., 2010). While we acknowledge that an ideal validation would include AVPV co-labeling, the experimental constraints related to OVX preclude this analysis in our dataset.

      Given these considerations and validations, we are confident that the KO is effective and specific.

      (2) Line 88: "... however, conflicting reports exist". Expand on this sentence to describe what these conflicting reports show. The authors responded to my comment but made no changes to the introduction. As a reader, I dislike being told there are conflicting reports, but then I have to go and look up the reference to see what that actual point of conflict is.

      By conflicting reports we meant that other studies have shown no association between MC4R and reproductive disorders, this has now been included in the revised manuscript (Line 89).

      (3) Could the authors explain how a decrease in AgRP would be interpreted as a "decrease in hypothalamic melanocortin tone" in line 142 and line 364? These overly simplistic interpretations of qPCR data detract from the overall quality of the paper.

      The reference to a decrease in melanocortin tone referred to the decrease in the expression of melanocortin receptor signaling, this has been clarified in the revised manuscript (lines 142 and 360).

      (4) Please show the individual cycle patterns for all animals, as in Figure 2B. This can be a supplemental figure, but the current bar charts are not informative.

      We respectfully disagree that the bar charts are not informative as they include the critical statistical analysis. We have now included all individual estrous cycle data in new separate supplemental figure (Sup. Figure 3). Therefore, we have excluded the representative cycles from the main figures as they are now in the new Supplemental. We have changed the orders of the figures in the text accordingly.

      (5) In their rebuttal, the authors state: "Mice lack true follicular and luteal phases, and therefore, it is impossible to separate estrogen-mediated changes from progesterone-mediated changes (e.g., in a proestrous female). Therefore, we use an ovariectomized female model in which we can generate an LH surge with an E2-replacement regimen [1]. This model enables us to focus on estrogen effects, exclude progesterone effects, and minimize variability. Inclusion of cycling females would make interpretation much more difficult." I disagree, but the authors can take this position if they wish. However, they should not report the responses to exogenous estradiol in an ovariectomised mouse as a "preovulatory LH surge" (line 380). An ovariectomised mouse cannot ovulate, and the estrogen-induced LH surge is significantly different in magnitude and timing from the endogenous preovulatory LH surge (likely due to the actions of progesterone). One goal of these studies is to understand why the ovulation rate appears to be low in the MC4-KO animals. Hence, evaluating whether the preovulatory LH surge is typical is important. This has not been done. The authors have shown that the response to exogenous estradiol is sub-normal. Such an effect might lead to a reduced preovulatory LH surge, but this has not been measured.

      We appreciate this reviewer’s concern about the nature of the preovulatory LH surge. We have clarified this in the revised manuscript and described it as “an induced LH surge” throughout the text (Lines 163, 533, 6560).

      (6) I believe that the ovulation process should be considered "all or none," and I do not quite understand the rebuttal discussion. The authors describe that "numerous follicles mature at the same time....". That is not disputed. My point was that each mature follicle will receive the identical endocrine ovulatory signal (correct? Or do the authors believe something different?). If it were sufficient for one follicle to ovulate, then all of those mature follicles (the number of which will be variable between animals and between cycles) would be expected to undergo ovulation. The fact that they do not raise several possibilities. One that the authors favor is that an insufficient ovulatory signal might approach a threshold where some follicles ovulate and others do not. This possibility is supported by the apparent increase in cystic follicles, which might be preovulatory follicles that did not complete the ovulation process. Such variation might be stochastic, within normal variation for sensitivity to LH. However, it is also possible that the follicles have not matured at the same rate, perhaps influenced by abnormal secretion of LH or FSH during earlier phases of the cycle, and hence are not in the appropriate condition to respond to the ovulation signal when it arrives. Some may even have matured prematurely due to the elevated gonadotropins reported in this study. Given the data and the partial fertility, the most likely explanation is that the genetic manipulation has resulted in fewer follicles being available for ovulation due to changes in follicular development rather than a deficit of the ovulation signal, although the latter mechanism might also contribute. A third possibility is that genetic manipulation has directly affected the ovary. The authors did not answer whether Kiss1 and MC4 are co-expressed in the ovary. I think the authors might want to rule this out by showing no change in MC4R expression in the ovary.

      We thank the reviewer for this thoughtful comment and agree that these are possible outcomes. We have now acknowledged them in the Discussion.

      To answer the reviewer’s question, we have not investigated the co-expression of Kiss1 and Mc4r in the ovary. While MC4R has indeed been documented in the ovary (Chen et al. Reproduction, 2017), the changes in gonadotropin release and supporting in vitro data included in this manuscript clearly document a central effect, however, an additional effect at the level of the ovary cannot be completely ruled out. This has now been added to the discussion (Line 378-387).

      (7) Lines 390, 454 " impaired LH pulse" What was the evidence for impaired LH pulse (see figure 2D)?

      Thank you for pointing this out. This comment referred to augmented LH release. This has been corrected in the revised manuscript (Line 394).

      The paper's strengths remain, as outlined in my original review. The authors have addressed what I perceived to be weaknesses, predominantly by changing the tone of discussion and interpretation of the data. This is appropriate. I consider the focus on the LH surge as the primary mechanism too narrow, and the authors should be considering how other changes during the cycle might influence ovarian function.

      We sincerely appreciate the reviewer’s thoughtful evaluation of our manuscript and their constructive feedback. We are pleased that our revisions have addressed the perceived weaknesses and that the adjustments to the discussion and interpretation were deemed appropriate.

      We acknowledge the reviewer’s perspective on broadening the discussion beyond the LH surge to consider additional cycle-dependent influences on ovarian function. While our current study focuses on this specific mechanism, we recognize that ovarian function is influenced by multiple physiological changes throughout the cycle. We have refined our discussion to reflect this broader context and appreciate the suggestion to consider these additional factors in future studies.

      We have addressed all of the reviewer’s comments to the best of our ability and hope they find the revised manuscript satisfactory.

    1. Author response:

      The following is the authors’ response to the original reviews

      ANALYTICAL

      (1) A key claim made here is that the same relationship (including the same parameter) describes data from pigeons by Gibbon and Balsam (1981; Figure 1) and the rats in this study (Figure 3). The evidence for this claim, as presented here, is not as strong as it could be. This is because the measure used for identifying trials to criterion in Figure 1 appears to differ from any of the criteria used in Figure 3, and the exact measure used for identifying trials to criterion influences the interpretation of Figure 3***. To make the claim that the quantitative relationship is one and the same in the Gibbon-Balsam and present datasets, one would need to use the same measure of learning on both datasets and show that the resultant plots are statistically indistinguishable, rather than simply plotting the dots from both data sets and spotlighting their visual similarity. In terms of their visual characteristics, it is worth noting that the plots are in log-log axis and, as such, slight visual changes can mean a big difference in actual numbers. For instance, between Figure 3B and 3C, the highest information group moves up only "slightly" on the y-axis but the difference is a factor of 5 in the real numbers. Thus, in order to support the strong claim that the quantitative relationships obtained in the Gibbon-Balsam and present datasets are identical, a more rigorous approach is needed for the comparisons.

      ***The measure of acquisition in Figure 3A is based on a previously established metric, whereas the measure in Figure 3B employs the relatively novel nDKL measure that is argued to be a better and theoretically based metric. Surprisingly, when r and r2 values are converted to the same metric across analyses, it appears that this new metric (Figure 3B) does well but not as well as the approach in Figure 3A. This raises questions about why a theoretically derived measure might not be performing as well on this analysis, and whether the more effective measure is either more reliable or tapping into some aspect of the processes that underlie acquisition that is not accounted for by the nDKL metric.

      Figure 3 shows that the relationship between learning rate and informativeness for our rats was very similar to that shown with pigeons by Gibbon and Balsam (1981). We have used multiple criteria to establish the number of trials to learn in our data, with the goal of demonstrating that the correspondence between the data sets was robust. In the revised Figure 3, specifically 3C and 3D, we have plotted trials to acquisition using decision criterion equivalent to those used by Gibbon and Balsam. The criterion they used—at least one peck at the response key on at least 3 out of 4 consecutive trials—cannot be directly applied to our magazine entry data because rats make magazine entries during the inter-trial interval (whereas pigeons do not peck at the response key in the inter-trial interval). Therefore, evidence for conditioning in our paradigm must involve comparison between the response rate during CS and the baseline response rate, rather than just counting responses during the CS. We have used two approaches to adapt the Gibbon and Balsam criterion to our data. One approach, plotted in Figure 3C, uses a non-parametric signed rank test for evidence that the CS response rate exceeds the pre-CS response rate, and adopting a statistical criterion equivalent to Gibbon and Balsam’s 3-out-of-4 consecutive trials (p<.3125). The second method (Figure 3D) estimates the nDkl for the criterion used by Gibbon and Balsam and then applies this criterion to the nDkl for our data. To estimate the nDkl of Gibbon and Balsam’s data, we have assumed there are no responses in the inter-trial interval and the response probability during the CS must be at least 0.75 (their criterion of at least 3 responses out of 4 trials). The nDkl for this difference is 2.2 (odds ratio 27:1). We have then applied this criterion to the nDkl obtained from our data to identify when the distribution of CS response rates has diverged by an equivalent amount from the distribution of pre-CS response rates. These two analyses have been added to the manuscript to replace those previously shown in Figures 3B and 3C.

      (2) Another interesting claim here is that the rates of responding during ITI and the cue are proportional to the corresponding reward rates with the same proportionality constant. This too requires more quantification and conceptual explanation. For quantification, it would be more convincing to calculate the regression slope for the ITI data and the cue data separately and then show that the corresponding slopes are not statistically distinguishable from each other. Conceptually, it is not clear why the data used to test the ITI proportionality came from the last 5 conditioning sessions. What were the decision criteria used to decide on averaging the final 5 sessions as terminal responses for the analyses in Figure 5? Was this based on consistency with previous work, or based on the greatest number of sessions where stable data for all animals could be extracted?

      If the model is that animals produce response rates during the ITI (a period with no possible rewards) based on the overall rate of rewards in the context, wouldn't it be better to test this before the cue learning has occurred? Before cue learning, the animals would presumably only have attributed rewards in the context to the context and thus, produce overall response rates in proportion to the contextual reward rate. After cue learning, the animals could technically know that the rate of rewards during ITI is zero. Why wouldn't it be better to test the plotted relationship for ITI before cue learning has occurred? Further, based on Figure 1, it seems that the overall ITI response rate reduces considerably with cue learning. What is the expected ITI response rate prior to learning based on the authors' conceptual model? Why does this rate differ from pre and post-cue learning? Finally, if the authors' conceptual framework predicts that ITI response rate after cue learning should be proportional to contextual reward rate, why should the cue response rate be proportional to the cue reward rate instead of the cue reward rate plus the contextual reward rate?

      A single regression line, as shown in Figure 5, is the simplest possible model of the relationship between response rate and reinforcement rate and it explains approximately 80% of the variance in response rate. Fixing the log-log slope at 1 yields the maximally simple model. (This regression is done in the logarithmic domain to satisfy the homoscedasticity assumption.) When transformed into the linear domain, this model assumes a truly scalar relation (linear, intercept at the origin) and assumes the same scale factor and the same scalar variability in response rates for both sets of data (ITI and CS). Our plot supports such a model. Its simplicity is its own motivation (Occam’s razor).

      If separate regression lines are fitted to the CS and ITI data, there is a small increase in explained variance (R<sub>2</sub> = 0.82). These regression lines have been added to the plot in the revised manuscript (Figure 5). We leave it to further research to determine whether such a complex model, with 4 parameters, is required. However, we do not think the present data warrant comparing the simplest possible model, with one parameter, to any more complex model for the following reasons:

      · When a brain—or any other machine—maps an observed (input) rate to a rate it produces (output rate), there is always an implicit scalar. In the special case where the produced rate equals the observed rate, the implicit scalar has value 1. Thus, there cannot be a simpler model than the one we propose, which is, in and of itself, interesting.

      · The present case is an intuitively accessible example of why the MDL (Minimum Description Length) approach to model complexity (Barron, Rissanen, & Yu, 1998; Grünwald, Myung, & Pitt, 2005; Rissanen, 1999) can yield a very different conclusion from the conclusion reached using the Bayesian Information Criterion (BIC) approach. The MDL approach measures the complexity of a model when given N data specified with precision of B bits per datum by computing (or approximating) the sum of the maximum-likelihoods of the model’s fits to all possible sets of N data with B precision per datum. The greater the sum over the maximum likelihoods, the more complex the model, that is, the greater its measured wiggle room, it’s capacity to fit data. Recall that von Neuman remarked to Fermi that with 4 parameters he could fit an elephant. His deeper point was that multi-parameter models bring neither insight nor predictive power; they explain only post-hoc, after one has adjusted their parameters in the light of the data. For realistic data sets like ours, the sums of maximum likelihoods are finite but astronomical. However, just as the Sterling approximation allows one to work with astronomical factorials, it has proved possible to develop readily computable approximations to these sums, which can be used to take model complexity into account when comparing models. Proponents of the MDL approach point out that the BIC is inadequate because models with the same number of parameters can have very different amounts of wiggle room. A standard illustration of this point is the contrast between logarithmic model and power-function model. Log regressions must be concave; whereas power function regressions can be concave, linear, or convex—yet they have the same number of parameters (one or two, depending on whether one counts the scale parameter that is always implicit). The MDL approach captures this difference in complexity because it measures wiggle room; the BIC approach does not, because it only counts parameters.

      · In the present case, one is comparing a model with no pivot and no vertical displacement at the boundary between the black dots and the red dots (the 1-parameter unilinear model) to a bilinear model that allows both a change in slope and a vertical displacement for both lines. The 4-parameter model is superior if we use the BIC to take model complexity into account. However, 4-parameter has ludicrously more wiggle room. It will provide excellent fits—high maximum likelihood—to data sets in which the red points have slope > 1, slope 0, or slope < 0 and in which it is also true that the intercept for the red points lies well below or well above the black points (non-overlap in the marginal distribution of the red and black data). The 1-parameter model, on the other hand, will provide terrible fits to all such data (very low maximum likelihoods). Thus, we believe the BIC does not properly capture the immense actual difference in the complexity between the 1-parameter model (unilinear with slope 1) to the 4-parameter model (bilinear with neither the slope nor the intercept fixed in the linear domain).

      · In any event, because the pivot (change in slope between black and red data sets), if any, is small and likewise for the displacement (vertical change), it suffices for now to know that the variance captured by the 1-parameter model is only marginally improved by adding three more parameters. Researchers using the properly corrected measured rate of head poking to measure the rate of reinforcement a subject expects can therefore assume that they have an approximately scalar measure of the subject’s expectation. Given our data, they won’t be far wrong even near the extremes of the values commonly used for rates of reinforcement. That is a major advance in current thinking, with strong implications for formal models of associative learning. It implies that the performance function that maps from the neurobiological realization of the subject’s expectation is not an unknown function. On the contrary, it’s the simplest possible function, the scalar function. That is a powerful constraint on brain-behavior linkage hypotheses, such as the many hypothesized relations between mesolimbic dopamine activity and the expectation that drives responding in Pavlovian conditioning (Berridge, 2012; Jeong et al., 2022; Y.  Niv, Daw, Joel, & Dayan, 2007; Y. Niv & Schoenbaum, 2008).

      The data in Figures 4 and 5 are taken from the last 5 sessions of training. The exact number of sessions was somewhat arbitrary but was chosen to meet two goals: (1) to capture asymptotic responding, which is why we restricted this to the end of the training, and (2) to obtain a sufficiently large sample of data to estimate reliably each rat’s response rate. We have checked what the data look like using the last 10 sessions, and can confirm it makes very little difference to the results. We now note this in the revised manuscript. The data for terminal responding by all rats, averaged over both the last 5 sessions and last 10 sessions, can be downloaded from https://osf.io/vmwzr/

      Finally, as noted by the reviews, the relationship between the contextual rate of reinforcement and ITI responding should also be evident if we had measured context responding prior to introducing the CS. However, there was no period in our experiment when rats were given unsignalled reinforcement (such as is done during “magazine training” in some experiments). Therefore, we could not measure responding based on contextual conditioning prior to the introduction of the CS. This is a question for future experiments that use an extended period of magazine training or “poor positive” protocols in which there are reinforcements during the ITIs as well as during the CSs. The learning rate equation has been shown to predict reinforcements to acquisition in the poor-positive case (Balsam, Fairhurst, & Gallistel, 2006).

      (3) There is a disconnect between the gradual nature of learning shown in Figures 7 and 8 and the information-theoretic model proposed by the authors. To the extent that we understand the model, the animals should simply learn the association once the evidence crosses a threshold (nDKL > threshold) and then produce behavior in proportion to the expected reward rate. If so, why should there be a gradual component of learning as shown in these figures? In terms of the proportional response rule to the rate of rewards, why is it changing as animals go from 10% to 90% of peak response? The manuscript would be greatly strengthened if these results were explained within the authors' conceptual framework. If these results are not anticipated by the authors' conceptual framework, this should be explicitly stated in the manuscript.

      One of us (CRG) has earlier suggested that responding appears abruptly when the accumulated evidence that the CS reinforcement rate is greater than the contextual rate exceeds a decision threshold (C.R.  Gallistel, Balsam, & Fairhurst, 2004). The new more extensive data require a more nuanced view. Evidence about the manner in which responding changes over the course of training is to some extent dependent on the analytic method used to track those changes. We presented two different approaches. The approach shown in Figures 7 and 8 (now 6 and 7), extending on that developed by Harris (2022), assumes a monotonic increase in response rate and uses the slope of the cumulative response rate to identify when responding exceeds particular milestones (percentiles of the asymptotic response rate). This analysis suggests a steady rise in responding over trials. Within our theoretical model, this might reflect an increase in the animal’s certainty about the CS reinforcement rate with accumulated evidence from each trial. While this method should be able to distinguish between a gradual change and a single abrupt change in responding (Harris, 2022) it may not distinguish between a gradual change and multiple step-like changes in responding and cannot account for decreases in response rate.

      The other analytic method we used relies on the information theoretic measure of divergence, the nDkl (Gallistel & Latham, 2023), to identify each point of change (up or down) in the response record. With that method, we discern three trends. First, the onset tends to be abrupt in that the initial step up is often large (an increase in response rate by 50% or more of the difference between its initial value and its terminal value is common and there are instances where the initial step is to the terminal rate or higher). Second, there is marked within-subject variability in the response rate, characterized by large steps up and down in the parsed response rates following the initial step up, but this variability tends to decrease with further training (there tend to be fewer and smaller steps in both the ITI response rates and the CS response rate as training progresses). Third, the overall trend, seen most clearly when one averages across subjects within groups is to a moderately higher rate of responding later in training than after the initial rise. We think that the first tendency reflects an underlying decision process whose latency is controlled by diminishing uncertainty about the two reinforcement rates and hence about their ratio. We think that decreasing uncertainty about the true values of the estimated rates of reinforcement is also likely to be an important part of the explanation for the second tendency (decreasing within-subject variation in response rates). It is less clear whether diminishing uncertainty can explain the trend toward a somewhat greater difference in the two response rates as conditioning progresses. It is perhaps worth noting that the distribution of the estimates of the informativeness ratio is likely to be heavy tailed and have peculiar properties (as witness, for example, the distribution of the ratio of two gamma distributions with arbitrary shape and scale parameters) but we are unable at this time to propound an explanation of the third trend.

      (4) Page 27, Procedure, final sentence: The magazine responding during the ITI is defined as the 20 s period immediately before CS onset. The range of ITI values (Table 1) always starts as low as 15 s in all 14 groups. Even in the case of an ITI on a trial that was exactly 20 s, this would also mean that the start of this period overlaps with the termination of the CS from the previous trial and delivery (and presumably consumption) of a pellet. It should be indicated whether the definition of the ITI period was modified on trials where the preceding ITI was < 20 s, and if any other criteria were used to define the ITI. Were the rats exposed to the reinforcers/pellets in their home cage prior to acquisition?

      There was an error in the description provided in the original text. The pre-CS period used to measure the ITI responding was 10 s rather than 20 s. There was always at least a 5-s gap between the end of the previous trial and the start of the pre-CS period. The statement about the pre-CS measure has been corrected in the revised manuscript.

      (5) For all the analyses, the exact models that were fit and the software used should be provided. For example, it is not necessarily clear to the reader (particularly in the absence of degrees of freedom) that the model discussed in Figure 3 fits on the individual subject data points or the group medians. Similarly, in Figure 6 there is no indication of whether a single regression model was fit to all the plotted data or whether tests of different slopes for each of the conditions were compared. With regards to the statistics in Figure 6, depending on how this was run, it is also a potential problem that the analyses do not correct for the potentially highly correlated multiple measurements from the same subjects, i.e. each rat provides 4 data points which are very unlikely to be independent observations.

      Details about model fitting have been added to the revision. The question about fitting a single model or multiple models to the data in Figure 6 (now 5) is addressed in response 2 above. In Figure 5, each rat provides 2 behavioural data points (ITI response rate and CS response rate) and 2 values for reinforcement rate (1/C and 1/T). There is a weak but significant correlation between the ITI and CS response rates (r = 0.28, p < 0.01; log transformed to correct for heteroscedasticity). By design, there is no correlation between the log reinforcement rates (r = 0.06, p = .404).

      CONCEPTUAL

      (1) We take the point that where traditional theories (e.g., Rescorla-Wagner) and rate estimation theory (RET) both explain some phenomenon, the explanation in terms of RET may be preferred as it will be grounded in aspects of an animal's experience rather than a hypothetical construct. However, like traditional theories, RET does not explain a range of phenomena - notably, those that require some sort of expectancy/representation as part of their explanation. This being said, traditional theories have been incorporated within models that have the representational power to explain a broader array of phenomena, which makes me wonder: Can rate estimation be incorporated in models that have representational power; and, if so, what might this look like? Alternatively, do the authors intend to claim that expectancy and/or representation - which follow from probabilistic theories in the RW mould - are unnecessary for explanations of animal behaviour?***

      It is important for the field to realize that the RW model cannot be used to explain the results of Rescorla’s (Rescorla, 1966; Rescorla, 1968, 1969) contingency-not-pairing experiments, despite what was claimed by Rescorla and Wagner (Rescorla & Wagner, 1972; Wagner & Rescorla, 1972) and has subsequently been claimed in many modelling papers and in most textbooks and reviews (Dayan & Niv, 2008; Y. Niv & Montague, 2008). Rescorla programmed reinforcements with a Poisson process. The defining property of a Poisson process is its flat hazard function; the reinforcements were equally likely at every moment in time when the process was running. This makes it impossible to say when non-reinforcements occurred and, a fortiori, to count them. The non-reinforcements are causal events in RW algorithm and subsequent versions of it. Their effects on associative strength are essential to the explanations proffered by these models. Non-reinforcements—failures to occur, updates when reinforcement is set to 0, hence also the lambda parameter—can have causal efficacy only when the successes may be predicted to occur at specified times (during “trials”). When reinforcements are programmed by a Poisson process, there are no such times. Attempts to apply the RW formula to reinforcement learning soon foundered on this problem (Gibbon, 1981; Gibbon, Berryman, & Thompson, 1974; Hallam, Grahame, & Miller, 1992; L.J. Hammond, 1980; L. J. Hammond & Paynter, 1983; Scott & Platt, 1985). The enduring popularity of the delta-rule updating equation in reinforcement learning depends on “big-concept” papers that don’t fit models to real data and discretize time into states while claiming to be real-time models (Y. Niv, 2009; Y. Niv, Daw, & Dayan, 2005).

      The information-theoretic approach to associative learning, which sometimes historically travels as RET (rate estimation theory), is unabashedly and inescapably representational. It assumes a temporal map and arithmetic machinery capable in principle of implementing any implementable computation. In short, it assumes a Turing-complete brain. It assumes that whatever the material basis of memory may be, it must make sense to ask of it how many bits can be stored in a given volume of material. This question is seldom posed in associative models of learning, nor by neurobiologists committed to the hypothesis that the Hebbian synapse is the material basis of memory. Many—including the new Nobelist, Geoffrey Hinton— would agree that the question makes no sense. When you assume that brains learn by rewiring themselves rather than by acquiring and storing information, it makes no sense.

      When a subject learns a rate of reinforcement, it bases its behavior on that expectation, and it alters its behavior when that expectation is disappointed. Subjects also learn probabilities when they are defined. They base some aspects of their behavior on those expectations, making computationally sophisticated use of their representation of the uncertainties (Balci, Freestone, & Gallistel, 2009; Chan & Harris, 2019; J. A. Harris, 2019; J.A. Harris & Andrew, 2017; J. A. Harris & Bouton, 2020; J. A. Harris, Kwok, & Gottlieb, 2019; Kheifets, Freestone, & Gallistel, 2017; Kheifets & Gallistel, 2012; Mallea, Schulhof, Gallistel, & Balsam, 2024 in press).

      (2) The discussion of Rescorla's (1967) and Kamin's (1968) findings needs some elaboration. These findings are already taken to mean that the target CS in each design is not informative about the occurrence of the US - hence, learning about this CS fails. In the case of blocking, we also know that changes in the rate of reinforcement across the shift from stage 1 to stage 2 of the protocol can produce unblocking. Perhaps more interesting from a rate estimation perspective, unblocking can also be achieved in a protocol that maintains the rate of reinforcement while varying the sensory properties of the US (Wagner). How does rate estimation theory account for these findings and/or the demonstrations of trans-reinforcer blocking (Pearce-Ganesan)? Are there other ways that the rate estimation account can be distinguished from traditional explanations of blocking and contingency effects? If so, these would be worth citing in the discussion. More generally, if one is going to highlight seminal findings (such as those by Rescorla and Kamin) that can be explained by rate estimation, it would be appropriate to acknowledge findings that challenge the theory - even if only to note that the theory, in its present form, is not all-encompassing. For example, it appears to me that the theory should not predict one-trial overshadowing or the overtraining reversal effect - both of which are amenable to discussion in terms of rates.

      I assume that the signature characteristics of latent inhibition and extinction would also pose a challenge to rate estimation theory, just as they pose a challenge to Rescorla-Wagner and other probability-based theories. Is this correct?

      The seemingly contradictory evidence of unblocking and trans-reinforcer blocking by Wagner and by Pearce and Ganesan cited above will be hard for any theory to accommodate. It will likely depend on what features of the US are represented in the conditioned response.

      RET predicts one-trial overshadowing, as anyone may verify in a scientific programming language because it has no free parameters; hence, no wiggle room. Overtraining reversal effects appear to depend on aspects of the subjects’ experience other than the rate of reinforcement. It seems unlikely that it can proffer an explanation.

      Various information-theoretic calculations give pretty good quantitative fits to the relatively few parametric studies of extinction and the partial-reinforcement extinction effect (see Gallistel (2012, Figs 3 & 4); Wilkes & Gallistel (2016, Fig 6) and Gallistel (2025, under review, Fig 6). It has not been applied to latent inhibition, in part for want of parametric data. However, clearly one should not attribute a negative rate to a context in which the subject had never been reinforced. An explanation, if it exists, would have to turn on the effect of that long period on initial rate estimates AND on evidence of a change in rate, as of the first reinforcement.

      Recommendations for authors:

      MINOR POINTS

      (1) It is not clear why Figure 3C is presented but not analyzed, and why the data presented in Figure 4 to clarify the spread of the distribution of the data observed across the plots in Figure 3 uses the data from Figure 3C. This would seem like the least representative data to illustrate the point of Figure 4. It also appears that the data plotted in Figure 4 corresponds to Figure 3A and 3B rather than the odds 10:1 data indicated in the text.

      Figures 3 has changed as already described. The data previously plotted in Figure 4 are now shown in 3B and corresponds to that plotted in Figure 3A.

      (2) Log(T) was not correlated with trials to criterion. If trials to criterion is inversely proportional to log(C/T) and C is uncorrelated with T, shouldn't trials to criterion be correlated with log(T)? Is this merely a matter of low statistical power?

      Yes. There is a small, but statistically non-significant, correlation between log(T) and trials to criterion, r = 0.35, p = .22. That correlation drops to .08 (p = .8) after factoring out log(C/T), which demonstrates that the weak correlation between log(T) and trials to criterion is based on the correlation between log(t) and log(C/T).

      (3) The rationale for the removal of the high information condition samples in the Fig 8 "Slope" plot to be weak. Can the authors justify this choice better? If all data are included, the relationship is clearly different from that shown in the plot.

      We have now reported correlations that include those 3 groups but noted that the correlations are largely driven by the much lower slope values of those 3 groups which is likely an artefact of their smaller number of trials. We use this to justify a second set of correlations that excludes those 3 groups.

      (4) The discussion states that there is at most one free parameter constrained by the data - the constant of proportionality for response rate. However, there is also another free parameter constrained by data-the informativeness at which expected trials to acquisition is 1.

      I think this comment is referring to two different sets of data. The constant of proportionality of the response rate refers to the scalar relationship between reinforcement rate and terminal response rate shown in Figure 5. The other parameter, the informativeness when trials to acquisition equals 1, describes the intercept of the regression line in Figure 1 (and 3).

      (5) The authors state that the measurement of available information is not often clear. Given this, how is contingency measurable based on the authors' framework?

      (6) Based on the variables provided in Supplementary File 3, containing the acquisition data, we were unable to reproduce the values reported in the analysis of Figure 3.

      Figure 3 has changed, using new criteria for trials to acquisition that attempt to match the criterion used by Gibbon and Balsam. The data on which these figures are based has been uploaded into OSF.

      GRAPHICAL AND TYPOGRAPHICAL

      (1) Y-axis labels in Figure 1 are not appropriately placed. 0 is sitting next to 0.1. 0 should sit at the bottom of the y-axis.

      If this comment refers to the 0 sitting above an arrow in the top right corner of the plot, this is not misaligned. The arrow pointing to zero is used to indicate that this axis approaches zero in the upward direction. 0 should not be aligned to a value on the axis since a learning rate of zero would indicate an infinite number of learning trials. The caption has been edited to explain this more clearly.

      (2) Typo, Page 6, Final Paragraph, line 4. "Fourteen groups of rats were trained with for 42 session"

      Corrected. Thank you.

      (3) Figure 3 caption: Typo, should probably be "Number of trials to acquisition"?

      This change has now been made. The axis shows reinforcements to acquisition to be consistent with Gibbon and Balsam, but trials and number of reinforcements are identical in our 100% reinforcement schedule.

      (4) Typo Page 17 Line 1: "Important pieces evidence about".

      Correct. Thank you.

      (5) Consider consistent usage of symbols/terms throughout the manuscript (e.g. Page 22, final paragraph: "iota = 2" is used instead of the corresponding symbol that has been used throughout).

      Changed.

      (6) Typo Page 28, Paragraph 1, Line 9: "We used a one-sample t-test using to identify when this".

      This section of text has been changed to reflect the new analysis used for the data in Figure 3.

      (7) Typo Page 29, Paragraph 1, Line 2: "problematic in cases where one of both rates are undefined" either typo or unclear phrasing.

      “of” has been corrected to “or”

      (8) Typo Page 30: Equation 3 appears to have an error and is not consistent with the initial printing of Equation 3 in the manuscript.

      The typo in initial expression of Eq 3 (page 23) has been corrected.

      (9) Typo Page 33, Line 5: "Figures 12".

      Corrected.

      (10) Typo Page 34, Line 10: "and the 5 the increasingly"? Should this be "the 5 points that"?

      Corrected.

      (11) Typo Page 35, Paragraph 2: "estimate of the onset of conditioned is the trial after which".

      Corrected.

      (12) Clarify: Page 35, final paragraph: it is stated that four-panel figures are included for each subject in the Supplementary files, but each subject has a six-panel figure in the Supplementary file.

      The text now clarifies that the 4-panel figures are included within the 6-panel figures in the Supplementary materials.

      (13) It is hard to identify the different groups in Figure 2 (Plot 15).

      The figure is simply intended to show that responding across seconds within the trial is relatively flat for each group. Individuation of specific groups is not particularly important.

      (14) It appears that the numbering on the y-axis is misaligned in Figure 2 relative to the corresponding points on the scale (unless I have misunderstood these values and the response rate measure to the ITI can drop below 0?).

      The numbers on the Y axes had become misaligned. That has now been corrected.

      (15) Please include the data from Figure 3A in the spreadsheet supplementary file 3. If it has already been included as one of the columns of data, please consider a clearer/consistent description of the relevant column variable in Supplementary File 1.

      The data from Figure 3 are now available from the linked OSF site, referenced in the manuscript.

      (16) Errors in supplementary data spreadsheets such that the C/T values are not consistent with those provided in Table 1 (C/T values of 4.5, 54, 180, and 300 are slightly different values in these spreadsheets). A similar error/mismatch appears to have occurred in the C/T labels for Figures (e.g. Figure 10) and the individual supplementary figures.

      The C/T values on the figures in the supplementary materials have been corrected and are now consistent with those in Table 1.

      (17) Currently the analysis and code provided at https://osf.io/vmwzr/ are not accessible without requesting access from the author. Please consider making these openly available without requiring a request for authorization. As such, a number of recommendations made here may already have been addressed by the data and code deposited on OSF. Apologies for any redundant recommendations.

      Data and code are now available in at the OSF site which has been made public without requiring request.

      (18) Please consider a clearer and more specific reference to supplementary materials. Currently, the reader is required to search through 4 separate supplementary files to identify what is being discussed/referenced in the text (e.g. Page 18, final line: "see Supplementary Materials" could simply be "see Figure S1").

      We have added specific page numbers in references to the Supplementary Materials.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript describes a novel magnetic steering technique to target human adipose derived mesenchymal stem cells (hAMSC) or induce pluripotent stem cells to the TM (iPSC-TM). The authors show that delivery of the stem cells lowered IOP, increased outflow facility, and increased TM cellularity.

      Strengths:

      The technique is novel and shows promise as a novel therapeutic to lower IOP in glaucoma. hAMSC are able to lower IOP below the baseline as well as increase outflow facility above baseline with no tumorigenicity. These data will have a positive impact on the field and will guide further research using hAMSC in glaucoma models.

      Weaknesses:

      The transgenic mouse model of glaucoma the authors used did not show ocular hypertensive phenotypes at 6-7 months of age as previously reported. Therefore, if there is no pathology in these animals the authors did not show a restoration of function, but rather a decrease in pressure below normal IOP.

      We appreciate the reviewer’s feedback and agree with the statement of weakness. Accordingly, we have revised the language to improve clarity. Specifically, all references to "restoration of IOP" or "restoration of conventional outflow function" have been replaced with more precise phrases, in the following locations: 

      • lines 2-3 (title): Magnetically steered cell therapy for reduction of intraocular pressure  as a treatment strategy for open-angle glaucoma

      • lines 36-8 (abstract): We observed a 4.5 [3.1, 6.0] mmHg or 27% reduction in intraocular pressure (IOP) for nine months after a single dose of only 1500 magnetically-steered hAMSCs, explained by increased conventional outflow facility and associated with higher TM cellularity.

      • lines 45-6 (one-sentence summary): A novel magnetic cell therapy provided effective intraocular pressure reduction in mice, motivating future translational studies.

      • lines 123-4 (introduction): Despite the absence of ocular hypertension in our MYOC<sup>Y437H</sup> mice, our data demonstrate sustained IOP lowering and a significant benefit of magnetic cell steering in the eye, particularly for hAMSCs, strongly indicating further translational potential.

      • line 207 (results): The observed reductions in IOP and increases in outflow facility after delivery of both cell types suggested functional changes in the conventional outflow pathway.

      • line 509-10 (discussion): In summary, this work shows the effectiveness of our novel magnetic TM cell therapy approach for long-term IOP reduction through functional changes in the conventional outflow pathway.

      It is very important to note that at the 23rd annual Trabecular Meshwork Study Club meeting (San Diego, December 2024), Dr. Zode, the lead author of reference 26 originally describing the transgenic myocilin mouse model, announced during his talk that this model no longer demonstrates the glaucomatous phenotype in his hands, which incidentally has motivated him to create a new, CRISPR MYOC mouse model. Dr. Zode also stated that he was uncertain of the reason for this loss of phenotype. His observation is consistent with our report. However, other investigators continue to observe the desired phenotype in their colonies of this mouse (Dr. Wei Zhu, personal communication). Continued use of this mouse model should therefore be approached with caution. 

      Reviewer #2 (Public review):

      Summary:

      This observational study investigates the efficacy of intracameral injected human stem cells as a means to re-functionalize the trabecular meshwork for the restoration of intraocular pressure homeostasis. Using a murine model of glaucoma, human adiposederived mesenchymal stem cells are shown to be biologically safer and functionally superior at eliciting a sustained reduction in intraocular pressure (IOP). The authors conclude that the use of human adipose-derived mesenchymal stem cells has the potential for long-term treatment of ocular hypertension in glaucoma.

      Strengths:

      A noted strength is the use of a magnetic steering technique to direct injected stem cells to the iridocorneal angle. An additional strength is the comparison of efficacy between two distinct sources of stem cells: human adipose-derived mesenchymal vs. induced pluripotent cell derivatives. Utilizing both in vivo and ex vivo methodology coupled with histological evidence of introduced stem cell localization provides a consistent and compelling argument for a sustainable impact exogenous stem cells may have on the refunctionalization of a pathologically compromised TM.

      Weaknesses:

      A noted weakness of the study, as pointed out by the authors, includes the unanticipated failure of the genetic model to develop glaucoma-related pathology (elevated IOP, TM cell changes). While this is most unfortunate, it does temper the conclusion that exogenous human adipose derived mesenchymal stem cells may restore TM cell function. Given that TM cell function was not altered in their genetic model, it is difficult to say with any certainty that the introduced stem cells would be capable of restoring pathologically altered TM function. A restoration effect remains to be seen. 

      We acknowledge that the phrase “restoration of TM function” is not fully supported by our results, given the absence of ocular hypertension in our animal model. Accordingly, we have revised the language to more precisely describe our findings. For specific details regarding these changes, please refer to our response to Reviewer 1’s public comments above.

      Another noted complication to these findings is the observation that sham intracameralinjected saline control animals all showed elevated IOP and reduced outflow facility, compared to WT or Tg untreated animals, which allowed for more robust statistically significant outcomes. Additional comments/concerns that the authors may wish to address are elaborated in the Private Review section.

      We agree that sham-injected animals tended to have higher average IOPs than transgenic animals in our study. However, these differences did not reach statistical significance and therefore remain inconclusive. Further, an increase in IOP following placebo injection has been previously reported (Zhu et al., 2016). 

      Prompted by the Referee’s comments and also a private comment from Referee 1, we further investigated this effect by analyzing IOP in uninjected contralateral eyes at the mid-term time point and comparing the IOPs in these eyes to other cohorts, as now presented as additional data in Supplementary Tables 1 and 2 and Supplementary Figure 4 (see below). In brief, the uninjected contralateral transgenic eyes (10 months old) showed an IOP of 16.5 [15.9, 17.1] mmHg, which was intermediate between the IOP levels of the 6–7-month-old Tg group (15.4 [14.7, 16.1] mmHg) and the sham group (16.9 [15.5, 18.2] mmHg). However, none of these differences reached statistical significance. Additionally, we cannot rule out potential contralateral effects induced by the injections.

      Regarding the best way to assess the effect of cell treatment, we feel very strongly that the most relevant IOP comparison is between cell-injected eyes and control (vehicle)-injected eyes, since this provides the most direct accounting for the effects of injection itself on IOP. Other comparisons, such as WT or untreated Tg eyes vs. cell-treated eyes, are interesting but harder to interpret. However, in response to the referee’s comment, we have added comparisons between cell-treated groups and untreated Tg eyes to Table 2, adjusting the post-hoc corrections accordingly. All hAMSC treated groups show statistically significant decrease in IOP even compared to Tg untreated eyes, while iPSC-TMs fail to reach such significance.

      The following changes were made to the manuscript:

      Lines 326 et seq.: Eyes subjected to saline injection exhibited marginally higher IOPs and lower outflow facilities on average, in comparison to the transgenic animals at baseline. However, due to the lack of statistical significance in these differences and the inherent age difference between the saline-injected animals and the non-injected controls at baseline, no conclusive inference can be drawn regarding the effect of saline injection. To investigate this phenomenon further, we also analyzed IOPs in uninjected contralateral eyes at the midterm time point (Supplementary Tables 1 and 2, Supplementary Figure 4). The uninjected contralateral transgenic eyes (10 months old) showed an IOP of 16.5 [15.9, 17.1] mmHg, which was intermediate between the IOP levels of the 6–7-month-old Tg group (15.4 [14.7, 16.1] mmHg) and the sham-injected group (16.9 [15.5, 18.2] mmHg). However, none of these differences reached statistical significance. Of note, contralateral hypertension has been previously reported after subconjunctival and periocular injection of dexamethasoneloaded nanoparticles (34), and we similarly cannot definitively rule out potential contralateral effects induced by our stem cell injections. Thus, we cannot draw any definite conclusions from these additional IOP comparisons at this time.

      Reviewer #3 (Public review):

      Summary:

      The purpose of the current manuscript was to investigate a magnetic cell steering technique for efficiency and tissue-specific targeting, using two types of stem cells, in a mouse model of glaucoma. As the authors point out, trabecular meshwork (TM) cell therapy is an active area of research for treating elevated intraocular pressure as observed in glaucoma. Thus, further studies determining the ideal cell choice for TM cell therapy is warranted. The experimental protocol of the manuscript involved the injection of either human adipose derived mesenchymal stem cells (hAMSCs) or induced pluripotent cell derivatives (iPSC-TM cells) into a previously reported mouse glaucoma model, the transgenic MYOCY437H mice and wild-type littermates followed by the magnetic cell steering. Numerous outcome measures were assessed and quantified including IOP, outflow facility, TM cellularity, retention of stem cells, and the inner wall BM of Schlemm's canal.

      Strengths:

      All of these analyses were carefully carried out and appropriate statistical methods were employed. The study has clearly shown that the hAMSCs are the cells of choice over the iPSC-TM cells, the latter of which caused tumors in the anterior chamber. The hAMSCs were shown to be retained in the anterior segment over time and this resulted in increased cellular density in the TM region and a reduction in IOP and outflow facility. These are all interesting findings and there is substantial data to support it.

      Weaknesses:

      However, where the study falls short is in the MYOCY437H mouse model of glaucoma that was employed. The authors clearly state that a major limitation of the study is that this model, in their hands, did not exhibit glaucomatous features as previously reported, such as a significant increase in IOP, which was part of the overall purpose of the study. The authors state that it is possible that "the transgene was silenced in the original breeders". The authors did not show PCR, western blot, or immuno of angle tissue of the tg to determine transgenic expression (increased expression of MYOC was shown in the angle tissue of the transgenics in the original paper by Zode et al, 2011). This should be investigated given that these mice were rederived. Thus, it is clearly possible that these are not transgenic mice.

      All MYOC mice that were used in this study were genotyped and confirmed to carry the transgene as noted in the original version of the paper (see lines 590-2). However, the transgene seems not to have been active, based on the lack of ocular hypertension as well as the lack of differences in supporting endpoints such as outflow facility and TM cellularity. While it would have been possible to carry out their recommended assays to investigate the root cause of this loss of phenotype this was not an objective of our study. Thus we instead here focus simply on communicating the observed loss of phenotype to readers. We also refer the referee to the final paragraph of our response to Referee 1. 

      If indeed they are transgenics, the authors may want to consider the fact that in the Zode paper, the most significant IOP elevation in the mutant mice was observed at night and thus this could be examined by the authors. 

      This is a good point. However, while the dark-phase IOP does exhibit a distinctly larger elevation (as previously observed in hypertonic saline sclerosis), Zode et al. also reported a notable 3 mmHg IOP increase during the light phase. The complete absence of such daytime (light phase) IOP elevation in our animals diminished our enthusiasm for pursuing darkphase IOP measurements. 

      Other glaucomatous features of these mice could also have been investigated such as loss of RGCs, to further determine their transgenic phenotype. 

      We agree that these other phenotypes could be studied, but in the absence of any detectable IOP elevation (and thus lack of mechanical insult on RGC axons), loss of RGC is extremely unlikely. We also note that the loss of retinal ganglion cells (RGCs) in the Myocilin model remains a subject of controversy. For example, despite a significant increase in IOP (>10 mmHg) in this model across four mouse strains, three, including C57BL6/J, did not exhibit any signs of optic nerve damage (McDowell et al., 2012). In contrast, Zhu et al. observed considerable nerve damage in this model, which was reversed following iPSC-TM cell transplantation (Zhu et al., 2016). Given these conflicting findings, we directed our efforts toward outcome measures directly related to aqueous humor dynamics.

      Finally, while increased cellular density in the TM region was observed, proliferative markers could be employed to determine if the transplanted cells are proliferating.

      We agree that identifying the source of the increased trabecular meshwork (TM) cellularity we observed is interesting and we plan to pursue that in future studies. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The sham-injected transgenic animals showed elevated IOP 3-4 weeks after the baseline measurements in the transgenic mice. The authors justify this may be due to the increase in age in these animals. However, this seems unlikely due to the short duration of time between measurement of the baseline IOP and the Short time point (3-4 weeks). The authors do not provide IOP data for any WT sham injected eyes or naïve Tg eyes at these time points. These data are essential to determine if the elevation is due to the sham injection, age, or the transgene. Could it be that the IOP in this cohort of Tg mice didn't increase until 7-8 months of age instead of 6-7 months of age? The methods state only unilateral injections of the stem cells were done so it is assumed the contralateral eye was uninjected. What was the IOP in these eyes? These data would clarify the confusion in the data from sham-injected animals compared to baseline (naive) measurements.

      We agree that the average IOP in saline-injected groups is higher than in WT or non-treated Tg mice, although the difference is inconclusive due to a lack of statistical significance. It is important to note, however, that this difference is subtle and not comparable to the 3 mmHg light-phase IOP elevation previously observed in this model (Zode et al., 2011). 

      We appreciate the reviewer’s suggestion to include IOP data from the contralateral uninjected eyes, and we have now provided this information along with the comparative statistics in the supplementary materials. Additional details can be found in our response to a similar comment from Reviewer 2’s public review. In summary, the IOP difference in contralateral non-injected ten-month-old transgenic eyes was even smaller than in the original Tg group. IOP elevation following saline injection in mice has been reported previously (Zhu et al., 2016). As a potential confounding factor, we highlight possible contralateral effects of the injection itself (which is why we initially did not analyze IOP in the contralateral eyes).

      The hAMSC-treated eyes appear to lower IOP even from baseline (although stats were only provided compared to the sham-injected eyes, which as stated above appear to have increased).

      However, the iPSC-TM-treated eyes had IOPs equal to that of the baseline measurements taken 3 weeks prior. The significance is coming from the "sham-treated" eyes which had elevated IOPs. The controls listed above should be included to make these conclusions.

      The reviewer makes an astute observation. Please refer to our response to a similar observation by Reviewer 2 under public reviews, where we provide and discuss the comparative statistics noted by the reviewer. However, we feel very strongly that the most relevant IOP comparison is between cell-injected eyes and control-injected eyes. 

      If the transgenic mouse model truly did not have a phenotype, then the authors are testing the ability of the stem cells to lower IOP from baseline normal pressures. Therefore, the authors are not "restoring function of the conventional outflow pathway" as there is no damage to begin with. The language in the manuscript should be corrected to reflect this if the transgenics have no phenotype.

      We agree and have adjusted the language accordingly. For further details, please refer to our response to your public review.

      The authors noted in the iPSC-TM-treated eyes there was a high rate of tumorigenicity. If the magnetic steering of these cells is specific and targeted to the TM, why do the tumors form near the central iris?

      While magnetic steering is more specific to the trabecular meshwork (TM) than previouslyused approaches (Bahrani Fard et al., 2023), it is not perfect, and a modest amount of offtarget delivery to the iris, including its central portion, still occurs. Apparently, it took only a few mis-directed iPSC-TM cells to lead to tumors in this work, which is a serious concern for future translational approaches. 

      Reviewer #2 (Recommendations for the authors):

      (1) It appears that mice were injected unilaterally (Line 590). I may have missed this, but was the companion un-injected eye analyzed in this study? If not analyzed, was there a confounding concern or limitation that necessitated omitting this possible control option?

      Contralateral effects, such as hypertension in the untreated eye after subconjunctival and periocular injection of dexamethasone-loaded nanoparticles, have previously been reported in the literature (Li et al., 2019) and also reported anecdotally by other leaders in the field to the senior authors, which is why we did not initially analyze contralateral eyes in this study. However, prompted by this comment and others, we have now included the IOP measurements for contralateral uninjected ten-month-old transgenic eyes in the supplementary materials. For further details, please refer to our response to your public review.

      (2) Were all these mice the same gender? Would gender be expected to alter the findings of this study?

      Animals of both sexes were randomly chosen and included in the study. We added the following statement to the Materials and Methods section (line 530): After breeding and genotyping, mice, regardless of sex, were maintained to age 6-7 months, when transgenic animals were expected to have developed a POAG phenotype.

      (3) As noted in the public review, the use of PBS for a control seems to have resulted in a slight elevation in IOP (Figure 2) as well as a reduction in outflow facility (Figure 3B) when compared to WT or Tg mice. Was this difference statistically significant? 

      The differences between the sham (saline)-injected groups at any time point and untreated Tg mice did not reach statistical significance for IOP, facility, or TM cellularity and for facility, did not even show clear trends. For example, WT mice had, on average, 0.2 mmHg higher IOP and 0.6 nl/min/mmHg greater facility than the Tg group. Meanwhile on a similar scale, the long-term sham group exhibited 0.4 nl/min/mmHg higher facility compared to the Tg group. As the statistical tests indicate, these differences should be interpreted more as noise than meaningful signal. 

      If so, then it should be noted as to whether the observed decrease in IOP following stem cell injection remained statistically significant when compared to these un-injected control animals. If significance was lost, then this should be appropriately noted and discussed. It is not apparently obvious why sham controls should have elevated IOP. This is a design and statistical concern.

      Please refer to our response to a similar observation by Reviewer 1. We believe that comparing the treatment (cell suspension in saline) with its age-matched vehicle (saline) is the appropriate approach which maintains rigor by most directly accounting for the effects of injection. 

      (4) The tonicity of the PBS used as a vehicle control was not stated and I did not see within the methods whether the stem cells were suspended using this same PBS vehicle. I assume isotonic phosphate buffered saline was used and that the stem cells were resuspended using the same sterile PBS. 

      Thanks for catching this. We added “sterile PBS (1X, Thermo Fisher Scientific, Waltham, MA)” to the Methods section of the manuscript (line 567). 

      With regards to using PBS as an injection control, I wonder if a better comparable control might have been to use mesenchymal stem cells that were rendered incapable of proliferating prior to intracameral injection. This, of course, addresses the unexplained mechanism(s) by which mesenchymal stem cells elicit a decrease in IOP.

      This is an interesting idea, and represents another level of control. However, we explicitly chose not to use non-proliferating hAMSCs as a control, for several reasons. Firstly, a saline injection is the simplest control and in this initial study with multiple groups, we did not feel another experimental group should be added. Second, this control would not rule out paracrine effects from injected cells, which our data suggested are an important effect. Third, rendering injected cells truly non-proliferative could introduce unwanted/unknown phenotypes in these cells that would need to be carefully characterized. That being said, if an efficient method could be developed to render an entire population of these cells irreversibly non-proliferating, the reviewer’s suggestion would be worth pursuing to better understand the mechanism of TM cell therapies. 

      (5) As noted in Figure 4C, TM cellular density as quantified was not altered in the sham control, so a loss of cellular density can not explain the elevated IOP with this group. Injecting viable (not determined?) mesenchymal stem cells did show, over the short term, a noted increase in TM cellular density. 

      Thank you for noting this. We agree that changes in cell density do not explain the mild IOP elevation in the sham group. As the referee certainly is aware, there are multiple reasons that IOP can be elevated (changes in trabecular meshwork extracellular matrix, changes in trabecular meshwork stiffness) that are not necessarily related to cell density.  Since we do not know definitively the cause of this mild elevation, we would prefer to not speculate about it in the manuscript. 

      Thanks for pointing out our omission of a statement about injected cell viability. We have now included the following statement in the Materials and Methods section (564-566): “For all the experiments where animals received hAMSC, cell count and >90% viability was verified using a Countess II Automated Cell Counter (Thermo Fisher Scientific, Waltham, MA).”

      I'm confused, as clearly stated (Lines 431-432), mesenchymal stem cells accumulated close to, but not within, the TM. How is it that TM cellular density increased if these stem cells did not enter the TM? The authors may wish to clarify this distinction. Given that mesenchymal stem cells did not increase the risk of tumorigenicity, do the authors have any evidence that these cells actually proliferated post-injection or did they undergo senesce thereby displaying senescence-associated secretory phenotype as a source of paracrine support?

      As the reviewer correctly noted, our observations show that hAMSCs primarily accumulated close to, but outside, the TM (likely caught up in the pectinate ligaments). Based on observations of increased TM cellularity, we think that the most likely explanation of these findings is paracrine signaling, as the reviewer suggests and which was discussed at length in the original version of the manuscript (lines 453-477). 

      We agree that, despite observing little signal from hAMSCs within the TM, labeling with proliferation markers (e.g., Ki-67) and searching for co-localization with exogenous cells, and/or labeling for senescence markers would have provided more mechanistic information. This is an excellent topic for future study, which we plan to pursue, but was outside the scope of this study. 

      (6) As noted in the public review, I think it is a bit of a stretch to even suggest that the findings of this study support stem cell restoration of TM function given that the model apparently did not produce TM cell dysfunction as anticipated. A restoration effect remains to be seen.

      We agree and have adjusted the language accordingly. For further details, please refer to our response to Reviewer 1’s public comment.

      Reviewer #3 (Recommendations for the authors):

      (1) Show PCR, western blot, or immuno of angle tissue of the MYOC tg to confirm transgenic expression.

      (2) Examine the IOP of mice at night.

      (3) Investigate other glaucomatous features in the mice to determine if they have any of the transgenic phenotypes previously reported.

      (4) Examine proliferative markers in the TM region of angles injected with stem cells.

      Please see our responses to all four of these comments in the public section.

      Bibliography (for this response letter only)

      Bahrani Fard, M.R., Chan, J., Sanchez Rodriguez, G., Yonk, M., Kuturu, S.R., Read, A.T., Emelianov, S.Y., Kuehn, M.H., Ethier, C.R., 2023. Improved magnetic delivery of cells to the trabecular meshwork in mice. Exp. Eye Res. 234, 109602. https://doi.org/10.1016/j.exer.2023.109602

      Li, G., Lee, C., Agrahari, V., Wang, K., Navarro, I., Sherwood, J.M., Crews, K., Farsiu, S., Gonzalez, P., Lin, C.-W., Mitra, A.K., Ethier, C.R., Stamer, W.D., 2019. In vivo measurement of trabecular meshwork stiffness in a corticosteroid-induced ocular hypertensive mouse model. Proc. Natl. Acad. Sci. U. S. A. 116, 1714–1722.

      https://doi.org/10.1073/pnas.1814889116

      Zhu, W., Gramlich, O.W., Laboissonniere, L., Jain, A., Sheffield, V.C., Trimarchi, J.M., Tucker, B.A., Kuehn, M.H., 2016. Transplantation of iPSC-derived TM cells rescues glaucoma phenotypes in vivo. Proc. Natl. Acad. Sci. 113, E3492–E3500.

      Zode, G.S., Kuehn, M.H., Nishimura, D.Y., Searby, C.C., Mohan, K., Grozdanic, S.D., Bugge, K., Anderson, M.G., Clark, A.F., Stone, E.M., Sheffield, V.C., 2011. Reduction of ER stress via a chemical chaperone prevents disease phenotypes in a mouse model of primary open angle glaucoma. J. Clin. Invest. 121, 3542–3553. https://doi.org/10.1172/JCI58183

    1. Author response:

      The following is the authors’ response to the original reviews

      Public reviews:

      Reviewer #1:

      The authors attempted to replicate previous work showing that counterconditioning leads to more persistent reduction of threat responses, relative to extinction. They also aimed to examine the neural mechanisms underlying counterconditioning and extinction. They achieved both of these aims and were able to provide some additional information, such as how counterconditioning impacts memory consolidation. Having a better understanding of which neural networks are engaged during counterconditioning may provide novel pharmacological targets to aid in therapies for traumatic memories. It will be interesting to follow up by examining the impact of varying amounts of time between acquisition and counterconditioning phases, to enhance replicability to real-world therapeutic settings.

      Major strengths

      · This paper is very well written and attempts to comprehensively assess multiple aspects of counterconditioning and extinction processes. For instance, the addition of memory retrieval tests is not core to the primary hypotheses but provides additional mechanistic information on how episodic memory is impacted by counterconditioning. This methodical approach is commonly seen in animal literature, but less so in human studies.

      · The Group x Cs-type x Phase repeated measure statistical tests with 'differentials' as outcome variables are quite complex, however, the authors have generally done a good job of teasing out significant F test findings with post hoc tests and presenting the data well visually. It is reassuring that there is a convergence between self-report data on arousal and valence and the pupil dilation response. Skin conductance is a notoriously challenging modality, so it is not too concerning that this was placed in the supplementary materials. Neural responses also occurred in logical regions with regard to reward learning.

      · Strong methodology with regards to neuroimaging analysis, and physiological measures.

      ·The authors are very clear on documenting where there were discrepancies from their pre-registration and providing valid rationales for why.

      We thank reviewer 1 for the positive feedback and for pointing out the strengths of our work. We agree that future research should investigate varying times between acquisition and counterconditioning to assess its success in real-life applications.

      Major Weaknesses

      (1) The statistics showing that counterconditioning prevents differential spontaneous recovery are the weakest p values of the paper (and using one-tailed tests, although this is valid due to directions being pre-hypothesized). This may be due to a relatively small number of participants and some variability in responses. It is difficult to see how many people were included in the final PDR and neuroimaging analyses, with exclusions not clearly documented. Based on Figure 3, there are relatively small numbers in the PDR analyses (n=14 and n=12 in counterconditioning and extinction, respectively). Of these, each group had 4 people with differential PDR results in the opposing direction to the group mean. This perhaps warrants mention as the reported effects may not hold in a subgroup of individuals, which could have clinical implications.

      General exclusion criteria are described on page 17. We have added more detailed information on the reasons for exclusion (see page 17). All exclusions were in line with pre-registered criteria. For the analysis, the reviewer is referring to (PDR analysis that investigated whether CC can prevent the spontaneous recovery of differential conditioned threat responses), 18 participants were excluded from this analysis: 2 participants did not show evidence for successful threat acquisition as was already indicated on page 17, and 16 participants were excluded due to (partially) missing data. We now explicitly mention the exclusion of the additional 16 participants on page 7 and have updated Figure 3 to improve visibility of the individual data points. Therefore, for this analysis both experimental groups consisted of 15 participants (total N=30).

      It is true that in both groups a few participants show the opposite pattern. Although this may also be due to measurement error, we agree that it is relevant to further investigate this in future studies with larger sample sizes. It will be crucial to identify who will respond to treatments based on the principles of standard extinction or counterconditioning. We have added this point in the discussion on page 14.

      Reviewer #2:

      Summary:

      The present study sets out to examine the impact of counterconditioning (CC) and extinction on conditioned threat responses in humans, particularly looking at neural mechanisms involved in threat memory suppression. By combining behavioral, physiological, and neuroimaging (fMRI) data, the authors aim to provide a clear picture of how CC might engage unique neural circuits and coding dynamics, potentially offering a more robust reduction in threat responses compared to traditional extinction.

      Strengths:

      One major strength of this work lies in its thoughtful and unique design - integrating subjective, physiological, and neuroimaging measures to capture the various aspects of counterconditioning (CC) in humans. Additionally, the study is centered on a well-motivated hypothesis and the findings have the potential to improve the current understanding of pathways associated with emotional and cognitive control. The data presentation is systematic, and the results on behavioral and physiological measures fit well with the hypothesized outcomes. The neuroimaging results also provide strong support for distinct neural mechanisms underlying CC versus extinction.

      We thank reviewer 2 for the feedback and for valuing the thoughtfulness that went into designing the study.

      Weaknesses:

      (1) Overall, this study is a well-conducted and thought-provoking investigation into counterconditioning, with strong potential to advance our understanding of threat modulation mechanisms. Two main weaknesses concern the scope and decisions regarding analysis choices. First, while the findings are solid, the topic of counterconditioning is relatively niche and may have limited appeal to a broader audience. Expanding the discussion to connect counterconditioning more explicitly to widely studied frameworks in emotional regulation or cognitive control would enhance the paper's accessibility and relevance to a wider range of readers. This broader framing could also underscore the generalizability and broader significance of the results. In addition, detailed steps in the statistical procedures and analysis parameters seem to be missing. This makes it challenging for readers to interpret the results in light of potential limitations given the data modality and/or analysis choices.

      In this updated version of the manuscript, we included the notion that extinction has been interpreted as a form of implicit emotion regulation. In addition to our discussion on active coping (avoidance), we believe that our discussion has an important link to the more general framework of emotion regulation, while remaining within the scope of relevance. Please see pages 14 and 15 for the changes. In addition to being informative to theories of emotion regulation, our findings are also highly relevant for forms of psychotherapy that build on principles of counterconditioning (e.g. the use of positive reinforcement in cognitive behavioral therapy), as we point out in the introduction. We believe this relevance shows that counterconditioning is more than a niche topic. In line with the recommendation from reviewer 2, we added more details and explanations to the statistical procedures and analyses where needed (see responses to recommendations).

      Reviewer #3:

      Summary:

      In this manuscript, Wirz et al use neuroimaging (fMRI) to show that counterconditioning produces a longer lasting reduction in fear conditioning relative to extinction and appears to rely on the nucleus accumbens rather than the ventromedial prefrontal cortex. These important findings are supported by convincing evidence and will be of interest to researchers across multiple subfields, including neuroscientists, cognitive theory researchers, and clinicians.

      In large part, the authors achieved their aims of giving a qualitative assessment of the behavioural mechanisms of counterconditioning versus extinction, as well as investigating the brain mechanisms. The results support their conclusions and give interesting insights into the psychological and neurobiological mechanisms of the processes that underlie the unlearning, or counteracting, of threat conditioning.

      Strengths:

      · Mostly clearly written with interesting psychological insights

      · Excellent behavioural design, well-controlled and tests for a number of different psychological phenomena (e.g. extinction, recovery, reinstatement, etc).

      · Very interesting results regarding the neural mechanisms of each process.

      · Good acknowledgement of the limitations of the study.

      We thank reviewer 3 for the detailed feedback and suggestions.

      Weaknesses:

      (1) I think the acquisition data belongs in the main figure, so the reader can discern whether or not there are directional differences prior to CC and extinction training that could account for the differences observed. This is particularly important for the valence data which appears to differ at baseline (supplemental figure 2C).

      Since our design is quite complex with a lot of results, we left the fear acquisition results as a successful manipulation check in the Supplementary Information to not overload the reader with information that is not the main focus of this manuscript. If the editor would like us to add the figure to the main text, we are happy to do so. During fear acquisition, both experimental groups showed comparable differential conditioned threat responses as measured by PDRs and SCRs. Subjective valence ratings indeed differed depending on CS category. Importantly, however, the groups only differed with respect to their rating to the CS- category, but not the CS+ category, which suggests that the strength of the acquired fear is similar between the groups. To make sure that these baseline differences cannot account for the differences in valence after CC/Ext, we ran an additional group comparison with differential valence ratings after fear acquisition added as a covariate. Results show that despite the baseline difference, the group difference in valence after CC/Ext is still significant (main effect Group: F<sub>(1,43)</sub>=7.364, p=0.010, η<sup>2</sup>=0.146). We have added this analysis to the manuscript (see page 7).

      (2) I was confused in several sections about the chronology of what was done and when. For instance, it appears that individuals went through re-extinction, but this is just called extinction in places.

      We understand that the complexity of the design may require a clearer description. We therefore made some changes throughout the manuscript to improve understanding. Figure 1 is very helpful in understanding the design and we therefore refer to that figure more regularly (see pages 6-7). We also added the time between tasks where appropriate (e.g. see page 7). Re-extinction after reinstatement was indeed mentioned once in the manuscript. Given that the reinstatement procedure was not successful (see page 9), we could not investigate re-extinction and it is therefore indeed not relevant to explicitly mention and may cause confusion. We therefore removed it (see page 12).

      (3) I was also confused about the data in Figure 3. It appears that the CC group maintained differential pupil dilation during CC, whereas extinction participants didn't, and the authors suggest that this is indicative of the anticipation of reward. Do reward-associated cues typically cause pupil dilation? Is this a general arousal response? If so, does this mean that the CSs become equally arousing over time for the CC group whereas the opposite occurs for the extinction group (i.e. Figure 3, bottom graphs)? It is then further confusing as to why the CC group lose differential responding on the spontaneous recovery test. I'm not sure this was adequately addressed.

      Indeed, reward and reward anticipation also evoke an increase in pupil dilation. This was an important reason for including a separate valence-specific response characterization task. Independently from the conditioning task, this task revealed that both threat and reward-anticipation induced strong arousal-related PDRs and SCRs. This was also reflected in the explicit arousal ratings, which were stronger for both the shock-reinforced (negative valence) and reward-reinforced (positive valence) stimuli. Therefore, it is not surprising that reward anticipation leads to stronger PDRs for CS+ (which predict reward) compared to CS- stimuli (which do not predict reward) during CC, but is reduced during extinction due to a decrease in shock anticipation. During the spontaneous recovery test, a return of stronger PDRs for CS+ compared to CS- stimuli in the standard extinction group can only reflect a return of shock anticipation. Importantly, the CC group received no rewards during the spontaneous recovery task and was aware of this, so it is to be expected that the effect is weakened in the CC group. However, CS+ and CS- items were still rated of similar valence and PDRs did not differ between CS+ and CS- items in the CC group, whereas the Ext group rated the CS+ significantly more negative and threat responses to the CS+ did return. It therefore is reasonable to conclude that associating the CS+ with reward helps to prevent a return of threat responses. We have added some clarifications and conclusions to this section on page 8.

      (4) I am not sure that the memories tested were truly episodic

      In line with previous publications from Dunsmoor et al.[1-4], our task allows for the investigation of memory for elements of a specific episode. In the example of our task, retrieval of a picture probes retrieval of the specific episode, in which the picture was presented. In contrast, fear retrieval relies on the retrieval of the category-threat association, which does not rely on retrieval of these specific episodic elements, but could be semantic in nature, as retrieval takes place at a conceptual level. We have added a small note on what we mean with episodic in this context on page 4. We do agree that we cannot investigate other aspects of episodic memories here, such as context, as this was not manipulated in this experiment.

      (5) Twice as many female participants than males

      It is indeed unfortunate that there is no equal distribution between female and male participants. Investigating sex differences was not the goal of this study, but we do hope that future studies with the appropriate sample sizes are able to investigate this specifically. We have added this to the limitations of this study on page 17.

      (6) No explanation as to why shocks were varied in intensity and how (pseudo-randomly?)

      The shock determination procedure is explained on pages 18-19 (Peripheral stimulation). As is common in fear conditioning studies in humans (see references), an ascending staircase procedure was used. The goal of this procedure is to try and equalize the subjective experience of the electrical shocks to be “maximally uncomfortable but not painful”.

      Recommendations for the authors:

      Reviewer #1:

      Very well written. No additional comments

      We thank reviewer 1 for valuing our original manuscript version. To further improve the manuscript, we adapted the current version based on the reviewer’s public review (see response to reviewer #1 public review comment 1).

      Reviewer #2:

      (1) I feel that more justification/explanation is needed on why other regions highly relevant to different aspects of counterconditioning (e.g., threat, memory, reward processing) were not included in the analyses.

      We first performed whole-brain analyses to get a general idea of the different neural mechanisms of CC compared to Ext. Clusters revealing significant group differences were then further investigated by means of preregistered ROI analyses. We included regions that have previously been shown to be most relevant for affective processing/threat responding (amygdala), memory (hippocampus), reward processing (NAcc) and regular extinction (vmPFC). We restricted our analyses to these most relevant ROIs as preregistered to prevent inflated or false-positive findings[5]. Beyond these preregistered ROIs, we applied appropriate whole-brain FEW corrections. The activated regions are listed in Supplementary Table 1 and include additional regions that were expected, such as the ACC and insula.

      (2) Were there observed differences across participants in the experiment? Any information on variance in the data such as how individual differences might influence these findings would provide a richer understanding of counterconditioning and increase the depth of interpretation for a broad readership.

      We agree that investigating individual differences is crucial to gain a better understanding of treatment efficacy in the framework of personalized medicine. Specifically, future research should aim to identify factors that help predict which treatment will be most effective for a particular patient. The results of this study provide a good basis for this, as we could show that the vmPFC in contrast to regular extinction, is not required in CC to improve the retention of safety memory. Therefore, this provides a viable option for patients who are not responding to treatments that rely on the vmPFC. In addition, as noted by Reviewer 1, in both groups a few participants show the opposite pattern (see Figure 3). It will be crucial to identify who will respond to treatments based on the principles of standard extinction or counterconditioning. We have added this point in the discussion on page 14.

      (3) While most figures are informative and clear, Figure 3 would benefit from detailed axis labels and a more descriptive caption. Currently, it is challenging to navigate the results presented to support the findings related to differential PDRs. A supplementary figure consolidating key patterns across conditions might also further facilitate understanding of this rather complicated result.

      We have made some changes to the figure to improve readability and understanding. Specifically, we changed the figure caption to “Change from last 2 trials CC/Ext to first 2 trials Spontaneous recovery test”, to give more details on what exactly is shown here. We also simplified the x-axis labels to “counterconditioning”, “recovery test” and “extinction”. With the addition of a clearer figure description, we hope to have improved understanding and do not think that another supplemental figure is needed.

      (4) Additional details on the statistical tests are needed. For example, please clarify whether p-values reported were corrected across all experimental conditions. Also, it would be helpful for the authors to discuss why for example repeated measures ANOVA or mixed-effects conditions were not used in this study. Might those tests not capture variance across participants' PDRs and SCRs over time better?

      We added that significant interactions were followed by Bonferroni-adjusted post-hoc tests where applicable (see page 21). We have used repeated measures ANOVAs to capture early versus late phases of acquisition and CC/extinction, as well as to compare late CC/extinction (last 2 trials) compared to early spontaneous recovery (first 2 trials) as is often done in the literature. A trial-level factor in a small sample would cost too many degrees of freedom and is not expected to provide more information. We have added this information and our reasoning to the methods section on page 21.

      Reviewer #3:

      (1) Suggest putting acquisition data into the main figures. In fact many of the supplemental figures could be integrated into the main figures in my opinion.

      See response to reviewer #3 public review comment 1.

      (2) Include explanations for why shock intensity was varied

      See response to reviewer #3 public review comment 6.

      (3) Include a better explanation for the change in differential responding from training to spontaneous recovery in the CC group (I think the loss of such responding in extinction makes more sense and is supported by the notion of spontaneous recovery, but I'm not sure about the loss in the CC group. There is some evidence from the rodent literature - which I am most familiar with - regarding a loss in contextual gradient across time which could account for some loss in specificity, could it be something like this?).

      See response to reviewer #3 public review comment 3.

      If we understand the reviewer correctly in that the we see a loss of differential responding due to a generalization to the CS-, this would imply an increase in responding to the CS-, which is not what we see. Our data should therefore be correctly interpreted as a loss of the specific response to the CS+ from the CC phase to the recovery test. Therefore, there is no spontaneous recovery in the CC group, and also not a non-specific recovery. To clarify this we relabeled Figure 3 by indicating “recovery test” instead of “spontaneous recovery”.

      (4) Is there a possibility that baseline differences, particularly that in Supplemental Figure 2C, could account for later differences? If differences persist after some transformation (e.g. percentage of baseline responding) this would be convincing to suggest that it doesn't.

      See response to reviewer #3 public review comment 1.

      (5) As I mentioned, I got confused by the chronology as I read through. Maybe mention early on when reporting the spontaneous recovery results that testing occurred the next day and that participants were undergoing re-extinction when talking about it for the second time.

      See response to reviewer #3 public review comment 2.

      (6) Page 8 - I was confused as to why it is surprising that the CC group were more aroused than the extinction group, the latter have not had CSs paired with anything with any valence, so doesn't this make sense? Or perhaps I am misunderstanding the results - here in text the authors refer back to Figure 2B, but I'm not sure if this is showing data from the spontaneous recovery test or from CC/extinction. If it is the latter, as the caption suggests, why are the authors referring to it here?

      Participants in the CC group showed increased differential self-reported arousal after CC, whereas arousal ratings did not differ between CS+ and CS- items after extinction. We interpret this in line with the valence and PDR results as an indication of reward-induced arousal. At the start of the next day, however, participants from the CC and extinction groups gave comparable ratings. It may therefore be surprising why participants in the CC group do not still show stronger ratings since nothing happened between these two ratings besides a night’s sleep (see design overview in Figure 1A). We removed the “suprisingly” to prevent any confusion.

      (7) I suggest that the authors comment on whether there were any gender differences in their results.

      See response to reviewer #3 public review comment 5.

      (8) The study makes several claims about episodic memory, but how can the authors be sure that the memories they are tapping into are episodic? Episodic has a very specific meaning - a biographical, contextually-based memory, whereas the information being encoded here could be semantic. Perhaps a bit of clarification around this issue could be helpful.

      See response to reviewer #3 public review comment 4.

      References

      (1) Dunsmoor, J. E. & Kroes, M. C. W. Episodic memory and Pavlovian conditioning: ships passing in the night. Curr Opin Behav Sci 26, 32-39 (2019). https://doi.org/10.1016/j.cobeha.2018.09.019

      (2) Dunsmoor, J. E. et al. Event segmentation protects emotional memories from competing experiences encoded close in time. Nature Human Behaviour 2, 291-299 (2018). https://doi.org/10.1038/s41562-018-0317-4

      (3) Dunsmoor, J. E., Murty, V. P., Clewett, D., Phelps, E. A. & Davachi, L. Tag and capture: how salient experiences target and rescue nearby events in memory. Trends Cogn Sci 26, 782-795 (2022). https://doi.org/10.1016/j.tics.2022.06.009

      (4) Dunsmoor, J. E., Murty, V. P., Davachi, L. & Phelps, E. A. Emotional learning selectively and retroactively strengthens memories for related events. Nature 520, 345-348 (2015). https://doi.org/10.1038/nature14106

      (5) Gentili, C., Cecchetti, L., Handjaras, G., Lettieri, G. & Cristea, I. A. The case for preregistering all region of interest (ROI) analyses in neuroimaging research. Eur J Neurosci 53, 357-361 (2021). https://doi.org/10.1111/ejn.14954

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This study provides useful findings about the effects of heterozygosity for Trio variants linked to neurodevelopmental and psychiatric disorders in mice. However, the strength of the evidence is limited and incomplete mainly because the experimental flow is difficult to follow, raising concerns about the conclusions' robustness. Clearer connections between variables, such as sex, age, behavior, brain regions, and synaptic measures, and more methodological detail on breeding strategies, test timelines, electrophysiology, and analysis, are needed to support their claims.

      We appreciate the opportunity to address the constructive feedback provided by eLife and the reviewers. Below, we respond to the overall assessment and individual reviewers' comments, clarifying our experimental approach, addressing concerns, and providing additional details where necessary.

      We thank the editors for highlighting the significance of our findings regarding the effects of Trio variant heterozygosity in mice. We acknowledge the feedback concerning the experimental flow and agree that clarity is paramount. To address these concerns:

      (1) Connections between variables: The word limit of the initial submission constrained our ability to provide adequate details and connections between variables. We have revised the manuscript to explicitly outline and extend explanations and the relationships between sex, age, behavior, brain regions, and synaptic measures, ensuring that the rationale for each experiment and its relevance to the overall conclusions are improved.

      (2) Methodological details: The Methods section of our initial submission was condensed, with key details provided in the Supplemental Methods section. We have merged all into an extended section to improve clarity. We have expanded our description of breeding strategies, test timelines, electrophysiological protocols, and data analysis methods in the revised Methods section. We believe the additions have enhanced the transparency and reproducibility of our study and ensured full support of our conclusions.

      (3) Experimental flow: We have revised and extended our results, methods, and discussion sections to clarify the rationale and experimental design to guide readers through the experimental sequence and rationale.

      We are confident these revisions address the concerns raised and enhance the robustness and coherence of our findings.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study explores how heterozygosity for specific neurodevelopmental disorder-associated Trio variants affects mouse behavior, brain structure, and synaptic function, revealing distinct impacts on motor, social, and cognitive behaviors linked to clinical phenotypes. Findings demonstrate that Trio variants yield unique changes in synaptic plasticity and glutamate release, highlighting Trio's critical role in presynaptic function and the importance of examining variant heterozygosity in vivo.

      Strengths:

      This study generated multiple mouse lines to model each Trio variant, reflecting point mutations observed in human patients with developmental disorders. The authors employed various approaches to evaluate the resulting behavioral, neuronal morphology, synaptic function, and proteomic phenotypes.

      Weaknesses:

      While the authors present extensive results, the flow of experiments is challenging to follow, raising concerns about the strength of the experimental conclusions. Additionally, the connection between sex, age, behavioral data, brain regions, synaptic transmission, and plasticity lacks clarity, making it difficult to understand the rationale behind each experiment. Clearer explanations of the purpose and connections between experiments are recommended. Furthermore, the methodology requires more detail, particularly regarding mouse breeding strategies, timelines for behavioral tests, electrophysiology conditions, and data analysis procedures.

      We appreciate the reviewer’s recognition of the novelty and comprehensiveness of our approach, particularly the generation of multiple mouse lines and our efforts to model Trio variant effects in vivo.

      Weaknesses

      (1) Experimental flow and rationale and connection between variables: We have expanded on the connections between behavioral data, neuronal morphology, synaptic function, and proteomics in the Results and Discussion sections to clarify how each experiment informs the reasoning and the conclusions and to highlight the relationships between sex, age, behavior, and synaptic measures.

      (2) Methodological details: Our initial Methods section was formatted to be short to fulfill word limits on the submitted version, with additional details provided in the Supplemental Methods section. We have merged our Methods and Supplemental Methods sections and expanded on our breeding strategies, test timelines, electrophysiological protocols, and data analysis. We believe these additions enhance the transparency and reproducibility of our study.

      (3) Recommendations for the authors: We thank Reviewer #1 for providing several recommendations to improve our manuscript. We have addressed their comments in the revision, as detailed below, adding key experiments that bolster our findings.

      Reviewer #2 (Public review):

      Summary:

      The authors generated three mouse lines harboring ASD, Schizophrenia, and Bipolar-associated variants in the TRIO gene. Anatomical, behavioral, physiological, and biochemical assays were deployed to compare and contrast the impact of these mutations in these animals. In this undertaking, the authors sought to identify and characterize the cellular and molecular mechanisms responsible for ASD, Schizophrenia, and Bipolar disorder development.

      Strengths:

      The establishment of TRIO dysfunction in the development of ASD, Schizophrenia, and Bipolar disorder is very recent and of great interest. Disorder-specific variants have been identified in the TRIO gene, and this study is the first to compare and contrast the impact of these variants in vivo in preclinical models. The impact of these mutations was carefully examined using an impressive host of methods. The authors achieved their goal of identifying behavioral, physiological, and molecular alterations that are disorder/variant specific. The impact of this work is extremely high given the growing appreciation of TRIO dysfunction in a large number of brain-related disorders. This work is very interesting in that it begins to identify the unique and subtle ways brain function is altered in ASD, Schizophrenia, and Bipolar disorder.

      Weaknesses:

      (1) Most assays were performed in older animals and perhaps only capture alterations that result from homeostatic changes resulting from prodromal pathology that may look very different.

      (2) Identification of upregulated (potentially compensating) genes in response to these disorder-specific Trio variants is extremely interesting. However, a functional demonstration of compensation is not provided.

      (3) There are instances where data is not shown in the manuscript. See "data not shown". All data collected should be provided even if significant differences are not observed.

      I consider weaknesses 1 and 2 minor. While they would be very interesting to explore, these experiments might be more appropriate for a follow-up study. I would recommend that the missing data in 3 should be provided in the supplemental material.

      We are grateful for the reviewer’s recognition of our study’s significance and methodological rigor. The acknowledgment of Trio dysfunction as a novel and impactful area of research is deeply appreciated.

      Weaknesses:

      We agree that focusing on older animals limits insights into early-stage pathophysiology. However, our goal in this study was to examine the functional impacts of Trio heterozygosity at an adolescent stage and to reveal the ultimate impact of these alleles on synaptic function. Our choice of age aligns with our objectives. Future studies of earlier developmental stages will be beneficial and complement these findings.

      Functional compensation:

      We tested functional compensation through rescue experiments in +/K1431M brain slices using a Rac1-specific inhibitor, NSC23766, which prevents Rac1 activation by Trio or Tiam1. Our finding that direct Rac1 inhibition normalizes deficient neurotransmitter release in +/K1431M mice strongly suggests that increased Rac1 activity drives this phenotype.

      Data not shown:

      We will incorporate all previously shown data into the Supplemental Materials, even when results are nonsignificant. We agree that this ensures full transparency and facilitates a more comprehensive evaluation of our findings.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In Figure 1K-N, the lack of observed differences in +/M2145T mice across all tests raises questions about its validity as a BPD model. Furthermore, the differences in female behavior data compared to males, as shown in the Supplemental section, lack clarification-specifically, whether these variations are due to sex differences or sample size disparities, which is not discussed. Additionally, it's unclear if the same mice were used in tests K through L-N, as the reported numbers differ without explanation; if relevant, any mortality should be reported. Given the observed body weight differences, it is important to display locomotor data, despite the mention of no change in open field results. Lastly, a detailed breeding strategy and timeline for behavioral testing would enhance clarity.

      We thank Reviewer 1 for recognizing these confusing points in our behavioral data and seek to add clarification in our Revision as below:

      (a) We have revised the text to emphasize our goal to evaluate the impact of NDD-related Trio alleles that have discrete and measurable effects on brain development and function, and not to model specific NDDs (e.g. ASD, SCZ, or BPD). The three specific Trio mutations were chosen based on strong evidence of these mutations impairing the biochemical functions of Trio. We reasoned our approach would reveal how impairing Trio in different ways – i.e. altering protein level or GEF1/GEF2 function – and under genetic conditions (heterozygosity) that mimic those found in individuals with Trio-related disorders impacts brain development and function. The lack of behavioral phenotypes in +/M2145T mice is indeed intriguing, especially given the alterations in electrophysiology and biochemistry experiments. It remains possible that further behavioral analyses of these mice will reveal behavioral phenotypes.

      (b) Given that the prevalence and clinical presentation of individuals with various NDDs are influenced by sex, it is possible that the behavioral differences we see in male versus female Trio variant mice reflect human sex difference phenotypes. We have reorganized the Figure panels to clarify these sex differences in behaviors (new Fig. 2, Supp. Fig. 2). We focused on the most significant behavioral phenotypes shared by both sexes in the main text, or in males alone, as our anatomical and electrophysiological experiments were restricted to males to reduce variation due to estrus. The observed behavioral sex differences are not likely due to sample size disparities as power analyses were performed for all experimental results to ensure adequate sample size. A comprehensive study of the mechanisms underlying these behavioral findings merits examination but is outside the scope of this study.

      (c) All mice were subjected to all behavioral tests described. No sudden mortality was observed during the behavioral experiments. Outliers in post-hoc statistical analyses were removed, which explains the apparent sample size differences between behavioral tests. We have revised the Data analysis section in our Methods to include these details (Lines 216-289, 450-457).

      (d) Results of the open field test have been added to the Supplemental Data (new Supp. Fig. 2) and Results (Lines 532-537)

      (e) The Methods section was expanded to include more detail on the breeding strategy (Lines 98-106). A timeline for behavioral testing has also been included in the Figures to enhance clarity (new Fig. 2A).

      (2) In Figure 2A-E, head width and brain weight showed significant differences, but not body weight, how come the ratio does not change? Comparing with female results in Supplementary Figure 2A-E, it does show a difference between males and females. It is essential to clarify which sex authors use in all follow-up experiments, including synapse, transmission, and plasticity. Since the males and females have different phenotypes, why do the authors focus on males only? The E plot has no data points on the bar graph. In Figure 2I, it lacks example images for all four conditions.

      We greatly appreciate this Reviewer’s attention to details in our brain and body weight data and revised the manuscript to address these concerns.

      (a) The ratios of head width/body weight were calculated for each individual mouse. Hence the distribution of the ratio data (old Fig. 2D; new Fig. 3D) differs from the distribution of head width or body weight data alone (old Fig. 2A, 2C, resp.; now Fig. 3A, 3C), and therefore can affect the p-value for statistical significance. The body weight of +/M2145T males is 21.217 ±0.327 g, while for WT males is 21.745 ±0.224 g, a non-significant decrease of 0.528 g (adjusted p=0.3806). These values have been added to the Fig 3. figure legend (Lines 1020-1034) for clarity.

      (b) Similar to the behavioral experiments in comment (1), we observed sex differences in head width, brain weight, and body weight in Trio heterozygous variant mice compared to WT counterparts. The differences in the ratios of head width/body weight or brain weight/body weight were the same for both males and females (i.e. head width/body weight ratio is decreased in +/K1431M mice compared to WT regardless of sex, and brain weight/body weight ratio is decreased in both +/K1431M and +/K1918X mice compared to WT regardless of sex). These findings affirm the impact of Trio mutations on these phenotypes across both sexes. We have modified the text to draw more attention to this key point (Lines 554-566 and 777-801).

      (c) All experiments (excluding behavior and weight data) were performed in males only to minimize the variation in spine and synapse morphology and physiological activity that can occur due to estrus. We have clarified this in the ‘Animal Work’ section of the Methods (Lines 103-106) as well as in the Figure Legends.

      (d) We thank the Reviewer for pointing out Fig. 3E lacks individual data points on the bar graph. Fig. 3E has been modified to now include the brain weight/body weight ratio for each individual mouse rather than across the population, to be consistent with the calculation of head width/body weight ratio (see point 2a).

      On original submission, only a representative WT image was selected due to space constraints. The figure (new Fig. 3H and 3K) and figure legend have been revised to include representative traces for all genotypes examined.

      (3) In lines 315-320, "None of the Trio variant heterozygotes exhibited altered dendritic spine density on M1 L5 pyramidal neurons compared to WT mice on either apical or basal arbors (Supplementary Figure 3L, M). Electron microscopy of cortical area M1 L5 revealed that synapse density was significantly increased in +/K1918X mice compared to WT (Figure 3A, B), possibly due to a net reduction in neuropil resulting from smaller dendritic arbors." The proposed explanation does not adequately address the observed discrepancy between spine density and synapse density reported in these two experiments. A more thorough analysis is needed to reconcile these conflicting findings and clarify how these distinct measurements may relate to each other in the context of the study's conclusions.

      We acknowledge the apparent discrepancy between our dendritic spine density data, which is unchanged from WT for all three Trio variant heterozygotes, and our synapse density data, which showed an increase in +/K1918X M1 L5 compared to WT. We have expanded the explanation for this discrepancy below and added this to the Discussion (Lines 802-811):

      a) Because spine density can vary by dendritic branch order and distance from the soma, only protrusions from secondary dendritic arbors of M1 L5 pyramidal neurons were quantified for consistency in analyses. However, all synapses meeting criteria were quantified in EM images, regardless of where they were located along an individual neuron’s arbors. It is possible that the density and distribution of spines along other arbors are different between genotypes but was not captured in our current data.

      b) +/K1918X L5 pyramidal neurons are smaller and less complex than WT neurons, especially in the basal compartment corresponding to L5 where EM images were obtained, consistent with the smaller brain size and reduced cortical thickness of +/K1918X mice. We posit that due to their smaller dendritic field size, L5 neurons pack more densely contributing to the increased synapse density observed in +/K1918X M1 L5 cortex. Consistent with this hypothesis, we observed a trend toward increased DAPI+ cell density in M1 L5 of +/K1918X neurons (Supp. Fig. 3N).

      (4) In Figure 4, one potential rationale for measuring AMPAR mEPSC frequency is to infer synapse density changes. However, the findings show no frequency change in +/K1431M and +/K1918X, with an increase only in +/M2145T, which contradicts Figure 3 results indicating a trend toward increased density across variants.

      This inconsistency is confusing, especially since the authors claim to follow the methodology from the study "Trio Haploinsufficiency Causes Neurodevelopmental Disease-Associated Deficits"; yet, the observed mEPSC amplitude differs significantly from that study, while the frequency remains unaffected. Additionally, the NMDAR mEPSCs reflect combined AMPAR and NMDAR responses at positive holding potentials, with peak amplitude dominated by AMPAR. This inconsistency between holding potential results is unclear, as frequency should theoretically align across negative and positive potentials. For accurate NMDAR mEPSC measurement, it would be optimal to assess amplitude 50 ms post-initial peak and, if possible, increase the holding potential to enhance the driving force given the typically low signal of NMDAR response.

      We thank the Reviewer for highlighting these important points.

      a) Previous work from our lab and others demonstrate that Trio regulates synaptic AMPA receptor levels, which is why we chose to focus on AMPAR-mediated evoked and miniature EPSC frequencies and amplitudes in the current study. We acknowledge Reviewer 1’s comment on seemingly contradictory results regarding AMPAR mEPSC frequency and synapse density; however, the unchanged AMPAR mEPSC frequency in +/K1431M and +/K1918X mice is consistent with our finding of unaltered dendritic spine density in these mice compared to WT (Supp. Fig. 4L,M). The differences between dendritic spine counts and synapse density is addressed in Response (3) above.

      b) While synapse density changes can be inferred from AMPAR mEPSC frequency, mEPSCs are also measures of spontaneous neurotransmitter release changes especially in the absence of changes in synaptic numbers. Notably, the increased mEPSC frequency in the +/M2145T variant is linked to enhanced spontaneous release, not to spine or synapse density changes. These findings are reinforced by increase in counts of synaptic vesicles, calculated PPR changes, and estimates of the Pr and RRP from HFS train analysis. We have included these points in the Discussion (Lines 861-863).

      c) While it is tempting to compare the current study to our previously published conditional Trio haploinsufficiency model, we highlight key distinctions that may underlie phenotypic differences between these two mouse models. First, our prior model used a NEX-Cre transgene to ablate one Trio allele from excitatory neurons only beginning at embryonic day 11. In contrast, our Trio variants are expressed in all cell types throughout development, akin to the genetic variants found in individuals with TRIO-related disorders. Second, the Trio variant mice in this study are on a C57BL/6 background, while the Trio haploinsufficient mice were on a mixed 129Sv/J X C57BL/6 background. These differences in the current study may explain why some measures, such as mEPSC amplitude, may not align with those from the Trio conditional haploinsufficiency model.

      d) Recordings were performed using specific inhibitors to isolate AMPA and NMDA mEPSCs; these missing methodological details have now been clarified in the updated Methods section (Lines 353-360).

      (5) In Supplementary Figure 4, the sample traces indicate a higher NMDA/AMPA ratio, raising the question of whether the AMPA EPSC amplitude changes, as this could reflect PSD length. In Figure 4B, the increased AMPAR mEPSC amplitude in the +/K1918X condition compared to WT suggests an enhanced postsynaptic response, yet the PSD length is reduced in Figure 3C. Can the authors provide a potential hypothesis to explain this?

      We appreciate the Reviewer’s feedback. Yes, both evoked and miniature recordings indicate increased AMPAR amplitudes in the +/K1918X variants compared to WT. While PSD length is often linked to synaptic strength, the observed reduction in PSD length in EM PSD length reduction in +/K1918X synapses is small (~6% of WT) and clearly does not correlate with significant changes in synaptic strength. We also note that the whole cell recordings of mEPSCs represent input from all active synapses on the neuron, while PSD length is measured only in synapses of the L5.

      (6) In Figure 4, synaptic plasticity appears to decrease to around 50% of baseline; could this reduction be attributed to LTD, or might it result from changes in pipette resistance? Additionally, is the observed potentiation due to changes in presynaptic release probability? Measuring paired-pulse ratio (PPR) before and after induction would clarify this aspect.

      We thank the Reviewer for highlighting these important points.

      a) We used a well-established theta burst stimulation method for LTP induction in M1 L5 pyramidal neurons. This protocol reliably evokes LTP in WT neurons, as shown in Fig. 5J and K. Both +/K1431M and +/K1918X variants exhibit a slight but discernible increase in evoked excitatory postsynaptic currents (eEPSCs), indicative of the initiation of LTP. Although this increase is smaller compared to WT, the presence of potentiation indicates that long-term depression (LTD) is an unlikely explanation for the observed reduction.

      b) To rule out the influence of technical artifacts, pipette resistance was carefully monitored before and after LTP induction. Any cells exhibiting resistance changes exceeding 20% during electrophysiological recordings were excluded from the analysis, ensuring that fluctuations in pipette resistance did not confound LTP measurements. These technical details are denoted in the Methods (Lines 344-346 and 364-366).

      c) The potentiation in the +/M2145T variant may stem from increased release probability (Pr) and greater synaptic vesicle availability, but is beyond the scope of this work. We agree this is an intriguing question, not only for +/M2145T but also for +/K1431M mice. Future studies should address this, ideally using models where the Trio variant is selectively introduced into the presynaptic neuron.

      (7) In lines 377-380, "The +/M2145T PPR curve was unusual, with significantly reduced PPF at short ISIs, yet clearly increased PPF at longer ISI (Figure 5A, B) compared to WT." The unusual PPR observed at the 100 ms ISI appears unexpected. Can the authors provide an explanation for this anomaly? This finding could suggest atypical presynaptic dynamics or modulation at this specific interval, which may differ from typical synaptic behavior. Further insights into possible mechanisms or experimental conditions affecting this result would be valuable.

      "The decreased PPF at initial ISI in +/M2145T mice correlated with increased mEPSC frequency (Fig. 4A-C), suggestive of a possible increase in spontaneous glutamate Pr." If this is the case, it raises the question of why the increased PPR at the initial ISI in +/K1431M does not correspond to the result shown in Figure 4C. This discrepancy suggests that factors beyond initial presynaptic release probability might be influencing the observed synaptic response, or that compensatory mechanisms could be affecting PPR and mEPSC frequency differently in this variant. Further clarification on the interplay between these measurements would help resolve this inconsistency.

      We appreciate the Reviewer’s critical reading and genuine interest on this phenotype in +/M2145T mice.

      a) The unusual shift of the PPR in +/M2145T at ISI 100ms is fascinating and will require significant additional experimentation that lies beyond the scope of this report to address. We propose it results from altered presynaptic regulators, including increased Syt3 and reduced RhoA activity. Notably, Syt3 influences calcium-dependent SV replenishment, which can cause similar PPR defects (Weingarten DJ et al., 2022); this is now included in the Discussion. (Lines 915-918).

      Weingarten DJ, Shrestha A, Juda-Nelson K, Kissiwaa SA, Spruston E, Jackman SL. Fast resupply of synaptic vesicles requires synaptotagmin-3. Nature. 2022 Nov;611(7935):320-325. doi: 10.1038/s41586-022-05337-1. Epub 2022 Oct 19. PMID: 36261524.

      b) Thank you for raising the concern in clarity of this statement "The decreased PPF at initial ISI in +/M2145T mice correlated with increased mEPSC frequency (Fig. 4A-C), suggestive of a possible increase in spontaneous glutamate Pr." We have edited the sentence to be more clear (Lines 701-703). First, the K1431M and M2145T variants impact different TRIO catalytic activities disrupting distinct GTPase pathways and differentially affecting presynaptic regulators, which can lead to non-overlapping phenotypes. Also, we expand our discussion that +/K1431M variant data suggest increased AMPAR numbers and fewer silent synapses (Lines 850-855), potentially increasing AMPAR mEPSC frequency and masking the expected decrease in spontaneous release (Lines 905-910). Further experiments are needed, ideally using mixed cultures with TRIO variants in presynaptic neurons with synapses on WT neurons, as minimal stimulation variance analysis in slices would be inconclusive due to its reflection of both Pr and silent synapse changes, similar to mEPSC frequency.

      (8) In Figure 5, there is no evidence demonstrating that the NSC inhibitor functions specifically in the +/K1431M condition without affecting other conditions. To verify its specificity, the authors should test the NSC inhibitor's effects across other conditions in parallel, including a control group. Additionally, cumulative RRP measurements should be provided for a more comprehensive assessment of the inhibitor's impact on synaptic function.

      We appreciate the Reviewer’s feedback.

      a) Previous studies have shown that Rac1 activity can bidirectionally regulate synchronous release probability (Pr). We used the Rac1-specific inhibitor NSC23766 (NSC) to test how Rac1 inhibition impacted the neurotransmitter release deficits observed in +/K1431M mice. We also added control experiments testing the impact of NSC on WT slices. These new experiments are now presented in new Fig. 8 of the revised manuscript, with expanded details in the Results (Lines 737-750) and Discussion (Lines 892-900).

      b) To estimate Pr and the RRP, we employed the Decay method as described by (Ruiz et al., 2011), which does not rely on cumulative EPSC plots for RRP estimation. This approach was chosen to account for the initial facilitation in these synapses and fits are done using EPSCs plotted against stimulus number. Additional details have been provided in the Methods section  (Lines 367-373).

      Ruiz R, Cano R, Casañas JJ, Gaffield MA, Betz WJ, Tabares L. Active zones and the readily releasable pool of synaptic vesicles at the neuromuscular junction of the mouse. J Neurosci. 2011 Feb 9;31(6):2000-8. doi: 10.1523/JNEUROSCI.4663-10.2011. PMID: 21307238; PMCID: PMC6633039.

      (9) Given the relevance to NDD, specifying the age window of the mice used is crucial. It is confusing that the synaptic function studies were conducted at P42, while the proteomic analysis was performed at P21. Could the authors clarify the rationale behind using different age points for these analyses? Consistency in age selection, or an explanation for this variation, would help in interpreting the developmental relevance of the findings.

      P42 was chosen as the age as it represents young adulthood, by which time clinical features will have already presented in individuals with neurodevelopmental disorders. Our prior studies of NEX-Cre Trio<sup>-/-</sup> mice found significant measurable differences from WT at this age, after neuronal migration, differentiation, synaptogenesis and pruning have occurred. An earlier developmental timepoint, P21, which coincides with juvenile age in mice, was chosen for proteomics studies to identify earlier changes and potentially targetable and modifiable mechanisms that could influence the phenotypes we observed in older mice. The experiments in P42 versus P21 mice were originally two independent lines of investigation that converged in the current study.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Recommendations for the authors:

      Reviewer #1:

      First, I thank the authors for clarifying some of the confusion I had in the previous comment and I appreciate the efforts the authors put into improving the quality of the manuscript. However, my concerns about the lack of novelty of the key findings are not perfectly addressed and there is no additional analysis done in this revision. Currently in this version of the manuscript, asserting that a p-value of 10-6 is close to genome-wide significance may be considered an overstatement. Further analysis focusing on finding novel and additional discovery is very necessary.

      We thank the reviewer for their comments. Reviewer #2 also made a comment regarding the genomewide threshold, “However, it remains unclear why the authors found it appropriate to apply STEAM to the LAAA model, a joint test for both allele and ancestry effects, which does not benefit from the same reduction in testing burden.” The reviewers’ have correctly identified our oversight - we have amended the manuscript as follows:

      (1) The abstract, “We identified a suggestive association peak (rs3117230, p-value = 5.292 x10-6, OR = 0.437, SE = 0.182) in the HLA-DPB1 gene originating from KhoeSan ancestry.”

      (2) From line 233 to 239: “The R package STEAM (Significance Threshold Estimation for Admixture Mapping) (Grinde et al., 2019) was used to determine the admixture mapping significance threshold given the global ancestral proportions of each individual and the number of generations since admixture (g = 15). For the LA model, a genome-wide significance threshold of pvalue < 2.5 x 10-6 was deemed significant by STEAM. The traditional genome-wide significance threshold of 5 x 10-8 was used for the GA, APA and LAAA models, as recommended by the authors of the LAAA model (Duan et al., 2018).” 

      (3) We excluded the results for the signal on chromosome 20, since this also did not reach the LAAA model genome-wide significance threshold.  

      (4) From line 296 to 308: “LAAA models were successfully applied for all five contributing ancestries (KhoeSan, Bantu-speaking African, European, East Asian and Southeast Asian). However, no variants passed the threshold for statistical significance. Although no variants reached genome-wide significance, a suggestive peak was identified in the HLA-II region of chromosome 6 when using the LAAA model and adjusting for KhoeSan ancestry (Figure 3). The QQ-plot suggested minimal genomic inflation, which was verified by calculating the genomic inflation factor ( = 1.05289) (Supplementary Figure 1). The lead variants identified using the LAAA model whilst adjusting for KhoeSan ancestry in this region on chromosome 6 are summarised in Table 3. The suggestive peak encompasses the HLA-DPA1/B1 (major histocompatibility complex, class II, DP alpha 1/beta 1) genes (Figure 4). It is noteworthy that without the LAAA model, this suggestive peak would not have been observed for this cohort. This highlights the importance of utilising the LAAA model in future association studies when investigating disease susceptibility loci in admixed individuals, such as the SAC population.”

      We acknowledge that our results are not statistically significant. However, our study advances this area of research by identifying suggestive African-specific ancestry associations with TB in the HLA-II region. These findings build upon the work of the ITHGC, which did not identify any significant associations or suggestive peaks in their African-specific analyses. We have included this argument in our manuscript (from lines 425 to 432):

      “The ITHGC did not identify any significant associations or suggestive peaks in their African ancestryspecific analyses.  Notably, the suggestive peak in the HLA-DPB1 region was only captured in our cohort using the LAAA model whilst adjusting for KhoeSan local ancestry. This underscores the importance of incorporating global and local ancestry in association studies investigating complex multi-way admixed individuals, as the genetic heterogeneity present in admixed individuals (produced as a result of admixtureinduced and ancestral LD patterns) may cause association signals to be missed when using traditional association models (Duan et al., 2018; Swart, van Eeden, et al., 2022).”

      We appreciate the comment regarding additional analyses. We acknowledge that we did not validate our SNP peak in the HLA-II region through fine-mapping due to the lack of a suitable reference panel (see lines 490 to 500). Our long-term goal is to develop a HLA-imputation reference panel incorporating KhoeSan ancestry; however, this is beyond the scope and funding allowances of this study.

      Reviewer #2 (Recommendations for the authors):

      The authors we think have done an excellent job with their responses and the manuscript has been substantially improved.

      Thank you for taking the time to help us improve our manuscript.

    1. Author response:

      We are grateful to the reviewers for their extensive and constructive feedback. In large the three reviewers noted the following main points:

      (1) The overall evidence for any rhythmicity in this data is not ‘very strong’.

      We do agree and will tone down the conclusions accordingly. However, as one of the reviewers noted, a qualitative interpretation of the specific statistical results remains somewhat vague and speculative by necessity.

      (2) The differences between the results for the individual experiments are generally small. Yet, the same reviewer also asks for speculations as to how differences between experiments can be interpreted.

      We will consider these, but also note that a clear demonstration of the robustness of specific effects requires the replication of individual experiments in a separate experiment.

      (3) A clear-cut interpretation of the current experimental design in the context of continuous listening and true vigilance tasks remains difficult. This makes the interpretation and generalization of the results difficult.

      We do agree in principle, but also note that task designs very widely in previous work, which may be one reason for why there is no clear consensus on the existence or absence of a rhythmic mode of listening. We will consider specific suggestions for future work to be included in the revision.

      (4) The adjustment of task difficulty in the present task design may pose a challenge. Reviewers also suggest analyzing potential rhythmicity in this task difficulty parameter.

      We will consider this for the revision.

      (5) A more clear-cut interpretation of what potential differences in the rhythmicity of sensitivity and bias would mean should be included.

      We will provide this in the revision.

      (6) The study should provide a stronger conceptual framework both for the source of "rhythmic modes" and why one may expect differences between ears.

      In large this has been put forward by many previous studies testing and reporting rhythmicity in auditory tasks.  Rhythmicity is pervasive in neural activity, but whether and how this relates to behavioral data remains less clear. These points will be clarified in a revision.

      (7) Parallels to work in the visual domain by Fiebelkorn, Landau & Fries should be included.

      We will discuss similarities and differences between studies on perceptual rhythmicity in the visual and auditory domains.

    1. Author Response:

      eLife assessment

      This is a valuable initial study of cell type and spatially resolved gene expression in and around the locus coeruleus, the primary source of the neuromodulator norepinephrine in the human brain. The data are generated with cutting-edge techniques, and the work lays the foundation for future descriptive and experimental approaches to understand the contribution of the locus coeruleus to healthy brain function and disease. However, due to small sample size and the need for additional confirmatory data, the data only incompletely support the main conclusions presented here. With the strengthening of the analyses, this paper, and the associated web application, will be of great interest to neuroscientists working on arousal-based behaviors and neurological and neuropsychiatric phenotypes.

      Thank you for the assessment and comments. Overall, the majority of the issues raised by the reviewers relate either directly or indirectly to limitations of the sample size that precluded further optimization of protocols and expansion of the dataset. We fully acknowledge the limited sample size in this dataset and aim to be transparent about the limitations of the study. This is the first report of snRNA-seq and spatially-resolved transcriptomics in the human locus coeruleus (LC). The LC is a very small nucleus, located deep within the brainstem, which is extremely challenging to study due to its small size, difficult to access location, and the very small number of norepinephrine (NE) neurons located within the nucleus, which were of prime interest for this study. We note that this study represents our initial attempt to molecularly and spatially characterize cell types within the human LC. We note that we did not have significant, established funding from extramural sources dedicated to this study, and tissue resources for the LC are difficult to ascertain, contributing to the small sample size in this initial study. We acknowledge that there are limitations in sample size as well as data quality. Findings from this study will be used to inform, improve, and optimize future and ongoing experimental design, as well as technical and analytical workflows for larger-scale studies. As brought up by one of the reviewers, this field is still in its infancy -- pilot experimentation in new brain regions is labor-intensive and these sequencing approaches remain costly. Moreover, due to the small size and difficulties in dissecting, tissue resources from the human brain in this area are a highly limited resource. Hence, notwithstanding limitations, in our view it is important to release the data for community access at this time. Specific responses to the reviewers’ comments are provided point-by-point in the following sections.

      Reviewer #1 (Public Review):

      Weber et al. collect locus coeruleus (LC) tissue blocks from 5 neurotypical European men, dissect the dorsal pons around the LC and prepare 2-3 tissue sections from each donor on a slide for 10X spatial transcriptomics. […] The authors transparently present limitations of their work in the discussion, but some points discussed below warrant further attention.

      Specific comments:

      1) snRNAseq:

      a. Major concerns with the snRNAseq dataset are A) the low recovery rate of putative LC-neurons in the snRNAseq dataset, B) the fact that the LC neuron cluster is contaminated with mitochondrial RNA, and C) that a large fraction of the nuclei cannot be assigned to a clear cell type (presumably due to contamination or damaged nuclei). The authors chose to enrich for neurons using NeuN antibody staining and FACS. But it is difficult to assess the efficacy of this enrichment without images of the nuclear suspension obtained before FACS, and of the FACS results. As this field is in its infancy, more detail on preliminary experiments would help the reader to understand why the authors processed the tissue the way they did. It would be nice to know whether omitting the FACS procedure might in fact result in higher relative recovery of LC-neurons, or if the authors tried this and discovered other technical issues that prompted them to use FACS.

      Thank you for these comments. We agree these are valid concerns in assessing the data quality and validity of the findings from the snRNA-seq dataset. We will respond to these concerns here to the best of our ability, but in some cases, we do not have definitive answers since comparison data are not yet available for this region. In particular, we were limited in resources for this initial study -- some of the results of the study and issues that we identified in attempting to molecularly profile cells in the human LC were surprising to us, and we intend to generate additional samples and troubleshoot these issues to improve data quality and increase recovery in future work. However, these experiments are (i) expensive, (ii) time- and labor-intensive, and (iii) the tissue for this region is limited and difficult to ascertain. Given the extremely small size of the LC, the tissue resource is quickly depleted. For this study, we had fixed resources and made best-guess decisions on how to proceed with the experimental design, based on our experience with snRNA-seq in other human brain regions (Tran and Maynard et al. 2021). However, the LC is a unique region, and our experiences with this dataset will guide us to make technical adjustments in future studies. Due to the limitations in the tissue resources and the lack of data currently available to the community, we wanted to share these results immediately while acknowledging the limitations of the study as we work to increase our resource availability to expand molecular and spatial profiling studies in this region of the human brain.

      Regarding the reviewer’s concern that our choice to use FANS to enrich for neurons could have potentially led to more damage and contributed to the low recovery rate of LC-NE neurons and the mitochondrial contamination -- we do not have a definitive answer to this question, since we did not perform a direct comparison with non-sorted data. As noted above, our limited tissue resource dictated that we could not do both. We made the decision to enrich for neurons based on our previous experience with identifying relatively rare populations in other brain regions (e.g. nucleus accumbens and amygdala; Tran and Maynard et al. 2021). Based on this previous work, our rationale was that without neuronal enrichment, we could potentially miss the LC-NE population, given the relative scarcity of this neuronal population. The low recovery rate and relatively lower quality / contamination issues may be due to technical issues that lead to LC-NE neurons being more susceptible to damage during nuclear preparation and sorting. We agree that directly comparing to data prepared without NeuN labeling and sorting is reasonable, as the additional perturbations may indeed contribute to cell damage. As mentioned in the discussion, we do not have a definitive answer to the reasons for increased mitochondrial contamination and we suspect that multiple technical factors may contribute -- including the relatively large size and increased fragility of LC-NE neurons. We agree that systematically optimizing the preparation to attempt to increase recovery rate and decrease mitochondrial contamination are important avenues for future work.

      b. It is unclear what percentage of cells that make up each cluster.

      We will add this information in the clustering heatmaps or as a supplementary plot in a revised version of the manuscript.

      c. The number of subjects used in each analysis was not always clear. Only 3 subjects were used for snRNAseq, and one of them only yielded 4 LC-nuclei. This means the results are essentially based on n=2. The authors report these numbers in the corresponding section, but the first sentence of the results section (and Figure 1C specifically!) create the impression that n=5 for all analyses. Even for spatial transcriptomics, if I understood it correctly, 1 sample had to be excluded (n=4).

      This is correct. We will update the figures and text in a revised version of the manuscript to make this limitation (small sample size) more clear, and to further emphasize that the intention of this study is to provide initial data to help determine next steps and best practices for a larger scale and more comprehensive study on this region, especially given the limited availability of tissue resources and currently limited data resources available for this region.

      2) Spatial transcriptomics:

      a. It is not clear to me what the spatial transcriptomics provides beyond what can be shown with snRNAseq, nor how these two sets of results compare to each other. It would be more intuitive to start the story with snRNAseq and then try to provide spatial detail using spatial transcriptomics. The LC is not a homogeneous structure but can be divided into ensembles based on projection specificity. Spatial transcriptomics could - in theory - offer much-needed insights into the spatial variation of mRNA profiles across different ensembles, or as a first step across the spatial (rostral/caudal, ventral/dorsal) extent of the LC. The current analyses, however, cannot address this issue, as the orientation of the LC cannot be deduced from the slices analyzed.

      We understand the point of the reviewer. However, we structured the manuscript in this format due to our aims of creating a data resource for the community as well as being transparent about the limitations of our study. Our experiments began with the spatial experiments on the tissue blocks because this (i) helped orient ourselves to the region, and (ii) provided guidance for how best to score the tissue blocks for the snRNA-seq experiments to maximize recovery of LC-NE neurons. Therefore, we also decided to present the results in this sequence.

      The spatial data also provides more information in that the measurements are from nuclei, cytoplasm, and cell processes (instead of nuclei only). This is one of the main differences / advantages between the platforms at this level of spatial resolution. As noted above, we were also working with a finite tissue resource -- if we ran snRNA-seq first and captured no neurons, the tissue block would be depleted. Due to the logistics / thickness of the required tissue sections for Visium and snRNA-seq respectively, running Visium first allowed us to ensure that we could collect data from both assays.

      Regarding a point raised below on why we only ran snRNA-seq on a subset of the donors -- this was due to resource depletion and not enough available tissue remaining on the tissue blocks to run the assay. We have conducted extensive piloting in other brain regions on the amount (mg) of tissue that is needed from various sized cryosections, and the LC is particularly difficult since these are small tissue blocks and the extent of the structure is small. Hence, in some of the subjects, we did not have sufficient tissue available for the snRNA-seq assay.

      We agree with the reviewer that spatial studies could, in future work, offer needed and important information about expression profiles across the spatial axes (rostral/caudal, ventral/dorsal) of the LC. Our study provides us with insight about optimizing the dissections for spatial assays, as well as bringing to light a number of technical and logistical issues that we had not initially foreseen. For example, during the course of this study and parallel, ongoing work in other small, challenging brain regions, we have now developed a number of specialized technical and logistical strategies for keeping track of orientation and mounting serial sections from the same tissue block onto a single spatial array, which is extremely technically challenging. We are now well-prepared for addressing these issues in future studies with larger numbers of donors and samples, e.g. spaced serial sections across the extent of the LC to make these types of insights. Due to the rarity of the tissue, limited availability of information in this region, and high expense of conducting these studies, we want to share this initial data with the community immediately. We also note that in addition to the 10x Genomics Visium platform, which lacks cellular and sub-cellular resolution, many new and exciting spatial platforms are entering the market, which may be able to address questions in very small regions such as the LC at higher spatial resolution.

      b. Unfortunately, spatial transcriptomics itself is plagued by sampling variability to a point where the RNAscope analyses the authors performed prove more powerful in addressing direct questions about gene expression patterns. Given that the authors compare their results to published datasets from rodent studies, it is surprising that a direct comparison of genes identified with spatial transcriptomics vs snRNAseq is lacking (unless this reviewer missed this comparison). Supplementary Figure 17 seems to be a first step in that direction, but this is not a gene-by-gene comparison of which analysis identifies which LC-enriched genes. Such an analysis should not compare numbers of enriched genes using artificial cutoffs for significance/fold-change, but rather use correlations to get a feeling for which genes appear to be enriched in the LC using both methods. This would result in one list of genes that can serve as a reference point for future work.

      We agree this is a good suggestion, and will add additional computational analyses to address this point in a revised version of the manuscript.

      c. Maybe the spatial transcriptomics could be useful to look at the peri-LC region, which has generated some excitement in rodent work recently, but remains largely unexplored in humans.

      We agree this is an excellent suggestion -- assessing cross-species comparisons related to convergence, especially, of GABAergic cell populations in the human LC is of high interest. We note that these types of extensions are exactly the reason why we have provided the publicly accessible web app (R/Shiny app, which includes the ability to annotate regions). We hope that others will use these apps for specialized topics they are interested in. As discussed above, we note that our initial dissections precluded the ability to keep track of the exact orientation of our tissue sections on the Visium arrays with respect to their location within the brainstem, so definitive localization of this region across subjects is difficult in our current study. However, it is possible, for example, to investigate whether there is a putative peri-LC region that is densely GABAergic that is homologous with the GABAergic peri-LC region in rodents. We also raise attention to a recent preprint by Luskin and Li et al. (2022), who apply snRNA-seq and spatially-resolved transcriptomics to molecularly define both LC and peri-LC cell types in mice -- in a revised version of our manuscript, we will extend our computational analyses of inhibitory neuronal subtypes in our data (Supplementary Figures 13, 16) to directly compare with those identified in this study in more detail. As noted above, we we have now developed a number of specialized technical and logistical strategies for keeping track of orientation of sections from the tissue block onto a single spatial array, and we feel that combined with optimized dissection strategies for this region and the guide of RNAscope for GABAergic markers on serial sections, that annotating the peri-LC region on spatial arrays in future studies will be possible.

      3) The comparison of snRNAseq data to published literature is laudable. Although the authors mention considerable methodological differences between the chosen rodent work and their own analyses, this needs to be further explained. The mouse dataset uses TRAPseq, which looks at translating mRNAs associated with ribosomes, very different from the nuclear RNA pool analyzed in the current work. The rat dataset used single-cell LC laser microdissection followed by microarray analyses, leading to major technical differences in terms of tissue processing and downstream analyses. The authors mention and reference a recent 10x mouse LC dataset (Luskin et al, 2022), however they only pick some neuropeptides from this study for their analysis of interneuron subtypes (Figure S13). Although this is a very interesting part of the manuscript, a more in-depth analysis of these two datasets would be very useful. It would likely allow for a better comparison between mouse and human, given that the technical approach is more similar (albeit without FACS), and Luskin et al have indicated that they are willing to share their data.

      As noted above, we plan to extend our comparisons with the dataset from Luskin and Li et al. (2022) in a revised version of the manuscript, which will provide a more in-depth cross-species comparison. In addition, we also note that there are some additional recent studies using TRAPseq of LC-NE neurons in a functional context, i.e. treatment vs. control experiments or in model systems (e.g. Iannitelli et al. 2023), which provide new opportunities for understanding disease context using in-depth cross-species comparisons. By providing our dataset and reproducible code, we will enable others to adapt and extend these types of comparisons (i.e. TRAPseq of LC-NE neurons or LC snRNA-seq following functional manipulations or in the context of disease or behavioral models) in the future.

      4) Statements in the manuscript about the unexpected identification of a 5-HT (serotonin) cell-cluster seem somewhat contradictory. Figure S14 suggests that 5-HT markers are expressed in the LC-regions just as much as anywhere else, but the RNAscope image in Figure S15 suggests spatial separation between these two populations. And Figure S17 again suggests almost perfect overlap between the LC and 5HT clusters. Maybe I misunderstood, in which case the authors should better clarify/explain these results.

      In our view, the most likely scenario is that the 5-HT neurons come from contamination from the dorsal raphe nucleus based on spatial separation from the RNAscope images, which we agree are more definitive. As mentioned above, since we do not have definitive documentation for the tissue sections in terms of orientation, it is difficult to say with clarity that the regions are the dorsal raphe and which sub-portion of the dorsal raphe they are. This initial study has now allowed us to optimize and improve our dissection strategy and approaches for retaining documentation of the orientation of the tissue sections from their intact position within the brainstem as they move from cryosection to placement on the array, which will enable us to better annotate regions with definitive anatomical information with respect to the rostral/caudal and dorsal/ventral axes in future experiments. Given that there are reports in the rodent that 5-HT markers have been identified in LC-NE neurons (Iijima 1993; Iijima 1989), and taking into account the technical limitations in our study, we felt that it was premature to definitively conclude in the manuscript that we were sure these signals arose from the dorsal raphe. We will update this language in a revised version of the manuscript to ensure that these limitations are clear (referring to Supplementary Figures S14-15, S17).

      Reviewer #2 (Public Review):

      The data generated for this paper provides an important resource for the neuroscience community. The locus coeruleus (LC) is the known seed of noradrenergic cells in the brain. Due to its location and size, it remains scarcely profiled in humans. Despite the physically minute structure containing these cells, its impact is wide-reaching due to the known neuromodulatory function of norepinephrine (NE) in processes like attention and mood. As such, profiling NE cells has important implications for most neurological and neuropsychiatric disorders. This paper generates transcriptomic profiles that are not only cell-specific but which also maintain their spatial context, providing the field with a map for the cells within the region.

      Strengths:

      Using spatial transcriptomics in a morphologically distinct region is a very attractive way to generate a map. Overlaying macroscopic information, i.e. a region with greater pigmentation, with its corresponding molecular profile in an unbiased manner is an extremely powerful way to understand the specific cellular and molecular composition of that brain structure.

      The technologies were used with an astute awareness of their limitations, as such, multiple technologies were leveraged to paint a more complete and resolved picture of the cellular composition of the region. For example, the lack of resolution in the spatial transcriptomic platform was compensated by complementary snRNA-seq and single molecule FISH.

      This work has been made publicly available and accessible through a user-friendly application such that any interested researcher can investigate the level of expression of their gene of interest within this region.

      Two important implications from this work are 1) the potential that the gene regulatory profiles of these cells are only partially conserved across species, humans, and rodents, and 2) that there may be other neuromodulatory cell types within the region that were otherwise not previously localized to the LC

      Weaknesses:

      Given that the markers used to identify cells are not as specific as they need to be to definitively qualify the desired cell type, the results may be over-interpreted. Specifically, TH is the primary marker used to qualify cells as noradrenergic, however, TH catalyzes the synthesis of L-DOPA, a precursor to dopamine, which in turn is a precursor for epinephrine and norepinephrine suggesting some of the cells in the region may be dopaminergic and not NE cells. Indeed, there are publications to support the presence of dopaminergic cells in the LC (see Kempadoo et al. 2016, Takeuchi et al., 2016, Devoto et al. 2005). This discrepancy is further highlighted by the apparent lack of overlap per given Visium spots with TH, SCL6A2, or DBH. While the single-nucleus FISH confirms that some of the cells in the region are noradrenergic, others very possibly represent a different catecholamine. As such it is suggested that the nomenclature for the cells be reconsidered.

      We appreciate the reviewer’s comment, and are aware of the reports suggesting the potential presence of dopaminergic cells in the LC. We initially had the same thought as the reviewer when we observed Visium spots in the spatial data with lack of overlap between TH, SLC6A2, and DBH as well as single nuclei in the snRNA-seq data with lack of overlap between TH, SLC6A2, and DBH. This surprising result was exactly why we performed the smFISH/RNAscope experiment with these three marker genes. Given known issues with read depth and coverage in the 10x Genomics assays, we wanted to better understand if this was a technical limitation in the sequencing coverage, or rather a true biological finding. The RNAscope data showed very clearly that nearly every cell body we looked at had co-localization of these three marker genes. We included an image from a single capture array of one tissue section in Supplementary Figure 11, but could, in a revised version of the manuscript, provide additional examples to illustrate how conclusive the images were by visualization. As such, we were quite convinced that the lack of overlap on Visium spots and in single nuclei in the snRNA-seq data was more likely related to technical issues with sequencing coverage, rather than a biological finding. We also note that we checked for the presence of the dopamine transporter, SLC6A3, and as can be appreciated in the iSEE web app for the snRNA-seq data or the R/Shiny web app for the Visium data, there is virtually no expression of SLC6A3 in the dataset, which in our view provides additional evidence against the possibility that there are substantial quantities of dopaminergic cells in this human LC dataset. We will include supplementary plots showing the lack of SLC6A3 expression in a revised version of the manuscript.

      The authors are unable to successfully implement unsupervised clustering with the spatial data, this greatly reduces the impact of the spatial technology as it implies that the transcriptomic data generated in the study did not have enough resolution to identify individual cell types.

      The reviewer is correct -- this is a fundamental limitation of the 10x Genomics Visium platform, i.e. the spatial resolution captures multiple cells per spot (e.g. around 1-10 cells per spot in human brain tissue). We note that new spatial platforms now provide cellular resolution (e.g. Vizgen MERSCOPE, 10x Genomics Xenium, 10x Genomics Visium HD), which will help address this in future work. However, many of these cellular-resolution in situ sequencing platforms have the limitation that they do not quantify genome-wide expression, and instead require users to select a priori gene panels to investigate. This is a problem if no genome-wide reference datasets are available. Hence, despite the limited spatial resolution of the Visium platform, this dataset is useful precisely for helping investigators choose gene panels for higher-resolution platforms or higher-order smFISH multiplexing.

      We also applied spatial clustering (using BayesSpace; Zhao et al. 2021) to attempt to segment the LC regions within the Visium samples in a data-driven manner as an alternative to the manual annotations, which was unsuccessful (and hence we relied on the manually annotated regions for downstream analyses) (Supplementary Figure S5). However, this is a different application of unsupervised clustering, which is separate from the task of identifying cell types.

      The sample contribution to the results is highly unbalanced, which consequently, may result in ungeneralizable findings in terms of regional cellular composition, limiting the usefulness of the publicly available data.

      We acknowledge the limitations of the work due to the small/unbalanced sample sizes. As mentioned above for Reviewer 1, this was an initial study in this region -- results of which will inform our (and hopefully others’) experimental design and approach to molecular profiling in this difficult to access brain region. Overall, this study was executed with finite tissue and financial resources and was intended to uncover limitations and help develop best practices and design workflows for future studies with larger numbers of donors and samples. Given the limited data availability for this brain region, we wanted to make this dataset available for the research community immediately. In addition, we note that making this genome-wide dataset available will help inform targeted gene panel design for higher-resolution platforms (e.g. 10x Genomics Xenium).

      This study aimed to deeply profile the LC in humans and provide a resource to the community. The combination of data types (snRNA-seq, SRT, smFISH) does in fact represent this resource for the community. However, due to the limitations, of which, some were described in the manuscript, we should be cautious in the use of the data for secondary analysis. For example, some of the cellular annotations may lack precision, the cellular composition also may not reflect the general population, and the presence of unexpected cell types may represent the accidental inclusion of adjacent regions, in this case, serotonergic cells from the Raphe nucleus.

      We agree, and have attempted to explain these limitations in the manuscript. We will clarify the language regarding the interpretation of the annotated cell populations and unexpected cell types, and the limited sample sizes, in a revised version of the manuscript.

      Nonetheless having a well-developed app to query and visualize these data will be an enormous asset to the community especially given the lack of information regarding the region in general.

      Reviewer #3 (Public Review):

      […] This study has many strengths. It is the first reported comprehensive map of the human LC transcriptome, and uses two independent but complementary approaches (spatial transcriptomics and snRNA-seq). Some of the key findings confirmed what has been described in the rodent LC, as well as some intriguing potential genes and modules identified that may be unique to humans and have the potential to explain LC-related disease states. The main limitations of the study were acknowledged by the authors and include the spatial resolution probably not being at the single cell level and the relatively small number of samples (and questionable quality) for the snRNA-seq data. Overall, the strengths greatly outweigh the limitations. This dataset will be a valuable resource for the neuroscience community, both in terms of methodology development and results that will no doubt enable important comparisons and follow-up studies.

      Major comments:

      Overall, the discovery of some cells in the LC region that express serotonergic markers is intriguing. However, no evidence is presented that these neurons actually produce 5-HT.

      The reviewer is correct that we did not provide any additional evidence to show that these neurons actually produce 5-HT. As noted above in the response to Reviewer 1, in our view, the most likely explanation is that these neurons are from dorsal raphe contamination on the tissue section. However, due to technical and logistical limitations in this study, we could not definitively say this because we did not clearly track the orientation of the tissue sections, and we did not have remaining tissue sections from all donor tissue blocks to repeat RNAscope experiments. For some of the donors, where we had remaining tissue sections to go back to repeat RNAscope experiments after completion of the snRNA-seq and Visium assays, we could see clear separation of the LC region / LC-NE neuron core from where putative 5-HT neurons were located (Supplementary Figure 15). However, we did not have sufficient tissue resources to map this definitively in all donors, and the orientation and anatomy of each tissue block were not fully annotated.

      Due to the lack of clarity, and the fact that there have been reports that LC-NE neurons express serotonergic markers (Iijima 1993; Iijima 1989), we felt that it was premature to definitively declare that these putative 5-HT neurons that we identified were definitively from the raphe. We will clarify the language around this discrepancy in a revised version of the manuscript to ensure that these limitations are clearly described.

      Concerning the snRNA-seq experiments, it is unclear why only 3 of the 5 donors were used, particularly given the low number of LC-NE nuclear transcriptomes obtained, why those 3 were chosen, and how many 100 um sections were used from each donor. It is also unclear if the 295 nuclei obtained truly representative of the LC population or whether they are just the most "resilient" LC nuclei that survive the process.

      As discussed above for Reviewer 1, the reason we included only 3 of the 5 donors for the snRNA-seq assays was due to the tissue availability on the tissue blocks. We will clarify the language in a revised version of the manuscript to make this limitation more clear. We will also include additional details in the Methods section on the number of 100 μm sections used for each donor (which varied between 10-15, approximating 60-80 mg of tissue).

      The LC displays rostral/caudal and dorsal/ventral differences, including where they project, which functions they regulate, and which parts are vulnerable in neurodegenerative disease (e.g. Loughlin et al., Neuroscience 18:291-306, 1986; Dahl et al., Nat Hum Behav 3:1203-14, 2019; Beardmore et al., J Alzheimer's Dis 83:5-22, 2021; Gilvesy et al., Acta Neuropathol 144:651-76, 2022; Madelung et al., Mov Disord 37:479-89, 2022). It was not clear which part(s) of the LC was captured for the SRT and snRNAseq experiments.

      As discussed above for Reviewer 1, a limitation of this study was that we did not record the orientation of the anatomy of the tissue sections, precluding our ability to annotate the tissue sections with the rostral/caudal and dorsal/ventral axis labels. We agree with the reviewer that additional spatial studies, in future work, could offer needed and important information about expression profiles across the spatial axes (rostral/caudal, ventral/dorsal) of the LC. Our study provides us with insight about optimizing the dissections for spatial assays, as well as bringing to light a number of technical and logistical issues that we had not initially foreseen. For example, during the course of this study and parallel, ongoing work in other, small, challenging regions, we have now developed a number of specialized technical and logistical strategies for keeping track of orientation and mounting serial sections from the same tissue block onto a single spatial array, which is extremely technically challenging. We are now well-prepared for addressing these issues in future studies with larger numbers of donors and samples in order to make these types of insights.

      The authors mention that in other human SRT studies, there are typically between 1-10 cells per expression spot. I imagine that this depends heavily on the part of the brain being studied and neuronal density, but it was unclear how many LC cells were contained in each expression spot.

      The reviewer is correct that we did not include this information in the manuscript. We attempted to apply a computational method to count nuclei contained in each gene expression spot based on analyzing the histological H&E images (VistoSeg; Tippani et al. 2022), which we have developed and previously applied in data from the dorsolateral prefrontal cortex (DLPFC) (Maynard and Collado-Torres et al. 2021). Based on the segmentation using this workflow we observe that the counts in this region are similar to what we observed in the DLPFC, i.e., typically between 1-10 LC cells per expression spot, with approximately 1-2 LC-NE neurons (which are characterized by their large size) per expression spot. However, these analyses had several technical issues related to the images themselves, the relatively large size and pigmentation of LC-NE neurons, and parameter settings that had been optimized for different brain regions. We are currently optimizing this analysis workflow for these images to provide more accurate estimates of cell counts per spot to give readers additional context on the number of nuclei per spot in the annotated LC regions and outside the LC regions in a revised version of the manuscript.

      Regarding comparison of human LC-associated genes with rat or mouse LC-associated genes (Fig. 2D-F), the authors speculate that the modest degree of overlap may be due to species differences between rodents and human and/or methodological differences (SRT vs microarray vs TRAP). Was there greater overlap between mouse and rat than between mouse/rat and human? If so, that is evidence for the former. If not, that is evidence for the latter. Also would be useful for more in-depth comparison with snRNA-seq data from mouse LC: https://www.biorxiv.org/content/10.1101/2022.06.30.498327v1.

      We will investigate this question and discuss this in updated results in a revised version of the manuscript.

      The finding of ACHE expression in LC neurons is intriguing, especially in light of work from Susan Greenfield suggesting that ACHE has functions independent of ACH metabolism that contributes to cellular vulnerability in neurodegenerative disease.

      We thank the reviewer for pointing this out. We were very surprised too by the observed expression of SLC5A7 and ACHE in the LC regions (Visium data) and within the LC-NE neuron cluster (snRNA-seq data), coupled with absence of other typical cholinergic marker genes (e.g. CHAT, SLC18A3), and we do not have a compelling explanation or theory for this. Hence, the work of Susan Greenfield and colleagues suggesting non-cholinergic actions of ACHE, particularly in other catecholaminergic neurons (e.g. dopaminergic neurons in the substantia nigra) is very interesting. We will include references to this work and how it could inform interpretation of this expression in a revised version of the manuscript (Greenfield 1991; Halliday and Greenfield 2012).

      High mitochondrial reads from snRNA-seq can indicate lower quality. It was not clear why, given the mitochondrial read count, the authors are confident in the snRNA-seq data from presumptive LC-NE neurons.

      We will include additional analyses to further investigate and/or confirm this finding (e.g. comparing sum of UMI counts / number of detected genes and mitochondrial percentage per nucleus for this population to confirm data quality) in additional supplementary figures in a revised version of the manuscript.

      References

      • Greenfield (1991), A noncholinergic action of acetylcholinesterase (AChE) in the brain: from neuronal secretion to the generation of movement, Cellular and Molecular Neurobiology, 11, 1, 55-77.

      • Halliday and Greenfield (2012), From protein to peptides: a spectrum of non-hydrolytic functions of acetylcholinesterase, Protein & Peptide Letters, 19, 2, 165-172.

      • Iannitelli et al. (2023), The neurotoxin DSP-4 dysregulates the locus coeruleus-norepinephrine system and recapitulates molecular and behavioral aspects of prodromal neurodegenerative disease, eNeuro, 10, 1, ENEURO.0483-22.2022.

      • Iijima K. (1989), An immunocytochemical study on the GABA-ergic and serotonin-ergic neurons in rat locus ceruleus with special reference to possible existence of the masked indoleamine cells. Acta Histochema, 87, 1, 43-57.

      • Iijima K. (1993), Chemocytoarchitecture of the rat locus ceruleus, Histology and Histopathology, 8, 3, 581-591.

      • Luskin A.T., Li L. et al. (2022), A diverse network of pericoerulear neurons control arousal states, bioRxiv (preprint).

      • Maynard and Collado-Torres et al. (2021), Transcriptome-scale spatial gene expression in the human dorsolateral prefrontal cortex, Nature Neuroscience, 24, 425-436.

      • Tippani et al. (2022), VistoSeg: processing utilities for high-resolution Visium/Visium-IF images for spatial transcriptomics data, bioRxiv (preprint).

      • Tran M.N., Maynard K.R. et al. (2021), Single-nucleus transcriptome analysis reveals cell-type-specific molecular signatures across reward circuitry in the human brain, Neuron, 109, 3088-3103.

      • Zhao E. et al. (2021), Spatial transcriptomics at subspot resolution with BayesSpace, Nature Biotechnology, 39, 1375-1384.

    1. Author response:

      The following is the authors’ response to the original reviews

      We thank the public reviewers and editors for their insightful comments on the manuscript. We have made the following changes to address their concerns and think the resulting manuscript is stronger as a result. Specifically, we have 1) added RNA FISH data of specific STB-2 and STB-3 RNA markers to confirm their distribution changes between STB<sup>in</sup> and STB<sup>out</sup> TOs, 2) removed language throughout the text that refer to STB-3 as a terminally differentiated nuclear subtype, and 3) generated CRISPR-mediated knock-outs of two genes identified by network analysis and validated their rolse in mediating STB nuclear subtype gene expression.

      Reviewer #1 (Public review): 

      Strengths: 

      The study offers a comprehensive SC- and SN-based characterization of trophoblast organoid models, providing a thorough validation of these models against human placental tissues. By comparing the older STB<sup>in</sup> and newer STB<sup>out</sup> models, the authors effectively demonstrate the improvements in the latter, particularly in the differentiation and gene expression profiles of STBs. This work serves as a critical resource for researchers, offering a clear delineation of the similarities and differences between TO-derived and primary STBs. The use of multiple advanced techniques, such as high-resolution sequencing and trajectory analysis, further enhances the study's contribution to the field. 

      Thank you for your thoughtful review—we appreciate your recognition of our efforts to comprehensively validate trophoblast organoid models and highlight key advancements in STB differentiation and gene expression.

      Weaknesses: 

      While the study is robust, some areas could benefit from further clarification. 

      (1) The importance of the TO model's orientation and its impact on outcomes could be emphasized more in the introduction. 

      We agree that TO orientation may significantly influence STB nuclear subtype differentiation. As the STB is critical for both barrier formation and molecular transport in vivo, lack of exposure to the surrounding media in STB<sup>in</sup> TOs in vitro could compromise these functions and the associated environmental cues that influence STB nuclear differentiation. We have added text to the introduction to highlight this point (lines 117-120).

      (2) The differences in cluster numbers/names between primary tissue and TO data need a clearer explanation, and consistent annotation could aid in comparison. 

      Thank you for highlighting that the comparisions and cluster annotations need clarification. In Figure 1, we did not aim to directly compare CTB and STB nuclear subtypes between TOs and tissue. Each dataset was analyzed independently, with clusters determined separately and with different resolutions decided via a clustering algorithm (Zappia and Oshlack, 2018). For example, for the STB, this approach identified seven subtypes in tissue but only two in TOs, making direct comparison challenging. To address this challenge, we integrated the SN datasets from TOs and tissue in Figure 6. This integration allowed us to directly compare gene expression between the sample types and examine the proportions within each STB subtype. Similarly, in Figure 2, direct comparison of individual CTB or STB clusters across the separate datasets is challenging (Figures 2A-C) due to differences in clustering. To overcome this, we integrated the datasets to compare cluster gene expression and relative proportions (Figures 2D-E). Nonetheless, to address the reviewers concern we have added text to the results section to clarify that subclusters of CTB and STB between datasets should not be directly compared until the datasets are integrated in Figure 2D-E and Figure 6 (lines 166-167).

      (3) The rationale for using SN sequencing over SC sequencing for TO evaluations should be clarified, especially regarding the potential underrepresentation of certain trophoblast subsets. 

      This is an important point as the challenges of studying a giant syncytial cell are often underappreciated by researchers that study mononucleated cells. We have added text to the introduction to clarify why traditional single cell RNA sequencing techniques were inadequate to collect  and characterize the STB (lines 91-93).

      (4) Additionally, more evidence could be provided to support the claims about STB differentiation in the STB<sup>out</sup> model and to determine whether its differentiation trajectory is unique or simply more advanced than in STB<sup>in</sup>. 

      Our original conclusion that STB<sup>out</sup> nuclei are more terminally differentiated than STB<sup>in</sup> was based on two observations: (1) STB<sup>out</sup> TOs exhibit increased expression of STB-specific pregnancy hormones and many classic STB marker genes and (2) STB<sup>out</sup> nuclei show an enrichment of the STB-3 nuclear subtype, which appears at the end of the slingshot pseudotime trajectory. However, upon consideration of the reviewer comments, we agree that this evidence is not sufficient to definitively distinguish if STB<sup>out</sup> nuclei are more advanced or follow a unique differentiation trajectory dependent on new environmental cues. Pseudotime analyses provided only a predictive framework for lineage tracing, and these predictions must be experimentally validated. Real-time tracking of STB nuclear subtypes in TOs would require a suite of genetic tools beyond the scope of this study. Therefore, to address the reviewers' concerns we have removed language suggesting that STB-3 is a terminally differentiated subtype or that STB<sup>out</sup> nuclei are more differentiated than STB<sup>in</sup> nuclei throughout the text until the discussion. Therein we present both our original hypothesis (that STB nuclei are further differentiated in STB<sup>out</sup>) and alternative explanations like changing trajectories due to local environmental cues (lines 619-625).

      Reviewer #2 (Public review): 

      Strengths: 

      (1) The use of SN and SC RNA sequencing provides a detailed analysis of STB formation and differentiation. 

      (2) The identification of distinct STB subtypes and novel gene markers such as RYBP offers new insights into STB development. 

      Thank you for highlighting these strengths—we appreciate your recognition of our use of SN and SC RNA sequencing to analyze STB differentiation and the discovery of distinct STB subtypes and novel gene markers like RYBP.

      Weaknesses: 

      (1) Inconsistencies in data presentation. 

      We address the individual comments of reviewer 2 later in this response.

      (2) Questionable interpretation of lncRNA signals: The use of long non-coding RNA (lncRNA) signals as cell type-specific markers may represent sequencing noise rather than true markers. 

      We appreciate the reviewer’s attention to detail in noticing the lncRNA signature seen in many STB nuclear subtypes. However, we disagree that these molecules simply represent sequencing noise. In fact, may studies have rigorously demonstrated that lncRNAs have both cell and tissue specific gene expression (e.g., Zhao et al 2022, Isakova et al 2021, Zheng et al 2020). Further, they have been shown to be useful markers of unique cell types during development (e.g., Morales-Vicente et al 2022, Zhou et al 2019, Kim et al 2015) and can enhance clustering interpretability in breast cancer (Malagoli et al 2024). Many lncRNAs have also been demonstrated to play a functional role in the human placenta, including H19, MEG3, and MEG8 (Adu-Gyamfi et al 2023) and differences are even seen in nuclear subtypes in trophoblast stem cells (Khan et al 2021). Therefore, we prefer to keep these lncRNA signatures included and let future researchers test their functional role.

      To improve the study's validity and significance, it is crucial to address the inconsistencies and to provide additional evidence for the claims. Supplementing with immunofluorescence staining for validating the distribution of STB_in, STB_out, and EVT_enrich in the organoid models is recommended to strengthen the results and conclusions. 

      Each general trophoblast cell type (CTB, STB, EVT) has been visualized by immunofluorescence by the Coyne laboratory in their initial papers characterizing the STB<sup>in</sup>, STB<sup>out</sup>, and EVT<sup>enrich</sup> models (Yang et al, 2022 and 2023). We agree that it is important to validate the STB nuclear subtypes found in our genomic study. However, one challenge in studying a syncytia is that immunofluorescence may not be a definitive method when the nuclei share a common cytoplasm. This is because protein products from mRNAs transcribed in one nucleus are translated in the cytoplasm and could diffuse beyond sites of transcription. Therefore, RNA fluorescence in situ hybridization (RNA-FISH) is instead needed. While a systematic characterization of the spatial distribution of the many marker genes found each subtype is outside the scope of this study, we include RNA-FISH of one STB-2 marker (PAPPA2) and one STB-3 marker (ADAMTS6) in Figure 3F-G and Supplemental Figure 3.3. This demonstrates there is an increase in STB-2 marker gene expression in STB<sup>in</sup> TOs and an increase in STB-3 marker gene expression in STB<sup>out</sup> TOs. 

      Reviewer #3 (Public review):  

      The authors present outstanding progress toward their aim of identifying, "the underlying control of the syncytiotrophoblast". They identify the chromatin remodeler, RYBP, as well as other regulatory networks that they propose are critical to syncytiotrophoblast development. This study is limited in fully addressing the aim, however, as functional evidence for the contributions of the factors/pathways to syncytiotrophoblast cell development is needed. Future experimentation testing the hypotheses generated by this work will define the essentiality of the identified factors to syncytiotrophoblast development and function. 

      We thank the reviewer for their thoughtful assessment, constructive feedback, and encouraging comments. We acknowledge that the initial manuscript primarily presented analyses suggesting correlations between RYBP and other factors identified in the gene network analysis and STB function. Understanding how gene networks in the STB are formed and regulated is a long-term goal that will require many experiments with collaborative efforts across multiple research groups.

      Nonetheless, to address this concern we have knocked out two key genes, RYBP and AFF1, in TOs using CRISPR-Cas9-mediated gene targeting. Bulk RNA sequencing of STB<sup>in</sup> TOs from both wild-type (WT) and knockout strains revealed that deletion of either gene caused a statistically significant decrease in the expression of the pregnancy hormone human placental lactogen and an increase in the expression of several genes characteristic of the oxygen-sensing STB-2 subtype, including FLT-1, PAPPA2, SPON2, and SFXN3. These findings demonstrate that knocking out RYBP or AFF1 results in an increase in STB-2 marker gene expression and therefore play a role in inhibiting their expression in WT TOs (Figure 5D-E and supplemental Figure 5.2). We also note that this is the first application of CRISPR-mediated gene silencing in a TO model.

      Future work will visualize the distribution of STB nuclear subtypes in these mutants and explore the mechanistic role of RYBP and AFF1 in STB nuclear subtype formation and maintenance. However, these investigations fall outside the scope of the current study.

      Localization and validation of the identified factors within tissue and at the protein level will also provide further contextual evidence to address the hypotheses generated. 

      We agree that visualizing STB nuclear subtype distribution is essential for testing the many hypotheses generated by our analysis. To address this, we have included RNA-FISH experiments for two STB subtype markers (PAPPA2 for STB-2 and ADAMTS6 for STB-3) in TOs. These experiments reveal an increase in PAPPA2 expression in STB<sup>in</sup> TOs and an increase in ADAMTS6 expression in STB<sup>out</sup> TOs (Figure 3F-G and Supplemental Figure 3.3). Genomic studies serve as powerful hypothesis generators, and we look forward to future work—both our own and that of other researchers—to validate the markers and hypotheses presented from our analysis.

      Recommendations for the authors: 

      Reviewing Editor Comments: 

      We strongly encourage the authors to further strengthen the study by addressing all reviewers' comments and recommendations, with particular attention to the following key aspects:

      (1) Clarifying the uniqueness of the STB differentiation trajectory between STB<sup>in</sup> and STB<sup>out</sup>, and determining whether STB<sup>out</sup> represents a more advanced stage of differentiation compared to STB<sup>in</sup>. It is also important to specify which developmental stage of placental villi the STB<sup>out</sup> and STB<sup>in</sup> are simulating. 

      We have revised the manuscript to remove definitive language claiming that STB-3 represents a terminally differentiated subtype or that STB<sup>out</sup> nuclei are more differentiated than STB<sup>in</sup> nuclei. Instead, we now present our hypothesis and alternative explanations in the discussion (lines 619-625), and emphasize the need for experimental validation of pseudotime predictions to test these hypotheses.

      (2) Utilizing immunofluorescence to validate the distribution of cell types in the organoid models. 

      The Coyne lab has previously performed immunofluorescence of CTB and STB markers in STB<sup>in</sup> and STB<sup>out</sup> TOs (Yang et al 2023). The syncytial nature of STBs complicates immunofluorescence-based validation of the STB nuclear subtypes due translating proteins all sharing a single common cytoplasm and therefore being able to diffuse and mix. Instead, we performed RNA-FISH for two STB subtype markers (PAPPA2, STB-2 and ADAMTS6, STB-3), which showed subtype-specific nuclear enrichment in STB<sup>in</sup> and STB<sup>out</sup> TOs, respectively (Figure 3F-G and Supplemental Figure 3.3).  

      (3) Addressing concerns regarding the use of lncRNA as cell marker genes. Employing canonical markers alongside critical TFs involved in differentiation pathways to perform a more robust cell-type analysis and validation is recommended.  

      As discussed in detail above, we maintain that lncRNAs are valuable markers, supported by their demonstrated roles in cell and tissue specificity and placental function. These signatures provide important insights and hypotheses for future research, and we have clarified this rationale in the revised manuscript.

      Reviewer #1 (Recommendations for the authors): 

      (1) The authors have presented an extensive SC- and SN-based characterization of their improved trophoblast TO model, including a comparison to human placental tissues and the previous TO iteration. In this way, the authors' work represents an invaluable resource for investigators by providing thorough validation of the TO model and a clear description of the similarities and differences between primary and TO-derived STBs. I would suggest that the authors reshape the study to further highlight and emphasize this aspect of the study. 

      We thank the reviewer for their thoughtful recommendation and agree that our datasets will serve as an invaluable resource for comparing in vitro models to in vivo gene expression. However, extensive validation is required to make definitive conclusions about the extent to which these systems mirror one another and where they diverge. For this reason, in this manuscript, we have focused on characterizing STB subtypes to provide a foundational understanding of the model and this poorly characterized subtype.

      (2) Introduction, Paragraph 3: What is the importance of orientation for the trophoblast TO model? The authors may consider removing some of the less important methodologic details from this paragraph and including more emphasis on why their TO model is an improvement. 

      Text has been added to this paragraph to highlight the importance of outward facing STB orientation, which is essential to mirror the STB’s transport function in vitro (lines 118-120).

      (3) Results, Figure 1: In addition to the primary placental tissue plots showing all cell populations, it may be useful to have side-by-side versions of similar plots showing only the trophoblast subsets, so that the primary and TO data could be more easily compared visually. 

      This has been implemented and added to the Supplemental Figure 1.4.

      (4) Results, Figure 1: In simple terms, what is the reason for ending up with different cluster numbers/names from the primary tissue and TO? Would it be possible to apply the same annotation to each (at least for trophoblast types) and thus allow direct comparison between the two? 

      As described above, each dataset was separately analyzed and clusters determined with an algorithm to determine the optimal clustering resolution. Therefore, the number of clusters between each dataset cannot be directly compared until the SN TO and tissue datasets are integrated together in Figure 6. We have added text to the manuscript to make it clear that they should not be compared except for in bulk number until this point (230-232).

      (5) Results, Figure 2: For subsequent evaluation of different in vitro TO conditions, did the authors use only SN sequencing because they wanted to focus on STB? Based on Figure 1, it seems some CTB subsets would be underrepresented if using only SN. Given that the authors look at both STB and CTB in their different TOs, is this an issue? 

      The CTB clusters that showed the greatest divergence between SC and SN datasets were those associated with mitosis and the cell cycle, likely due to nuclear envelope breakdown interfering with capture by the 10x microfluidics pipeline. While cytoplasmic gene expression provides valuable insights into CTB function, our manuscript focuses on the STB starting from Figure 2. Since the STB is captured exclusively by the SN dataset, we concentrated on this approach to streamline our analysis.

      (6) Results, Figure 3: What do the authors consider to be the primary contributing factors for why the STB subsets display differential gene expression between STB<sup>in</sup> and STB<sup>out</sup>? Is this due primarily to the cultural conditions and/or a result of the differing spatial arrangement with CTBs? 

      This is an intriguing question that is challenging to disentangle because the culture conditions are integral to flipping the orientation. The two primary factors that differ between STB<sup>in</sup> and STB<sup>out</sup> TOs are the presence of extracellular matrix in STB<sup>in</sup> and direct exposure to the surrounding media in STB<sup>out</sup>. We believe these environmental cues play a significant role in shaping the gene expression of STB subsets. Fully disentangling this relationship would require a method to alter the TO orientation without changing the culture conditions. While this is an exciting direction for future research, it falls outside the scope of the present study.

      (7) Results, Figure 4: The authors' analysis indicates that the STB nuclei from the STB<sup>out</sup> TO are likely "more differentiated" than those in STB<sup>in</sup> TO. Could the authors provide some qualitative or quantitative support for this? Is the STB<sup>out</sup> differentiated phenotype closer to what would be observed in a fully formed placenta? 

      As discussed earlier, we agree with the reviewers that this claim should be removed from the text outside of the discussion.

      (8) Results, Figure 5: Based on the trajectory analysis, do the authors consider that the STB from STB<sup>out</sup> TO are simply further along the differentiation pathway compared to those from STB<sup>in</sup> TO, or do the STB from STB<sup>out</sup> TO follow a differentiation pathway that is intrinsically distinct from STB<sup>in</sup> TO? 

      We think the idea of an intrinsically distinct pathway is a fascinating alternative hypothesis and have added it into the discussion. We do not find the pseudotime currently allows us to answer this question without additional experiments, so we have removed claims that the STB<sup>out</sup> STB nuclei are further along the differentiation pathway.

      (9) Results, Figure 6: A notable difference between the STB<sup>out</sup> TO and the term tissue is that the CTB subsets are much more prevalent. Is this simply a scale difference, i.e. due to the size of the human placenta compared to the limited STB nuclei available in the STB<sup>out</sup> TO? Or are there other contributing factors? 

      The proportion of CTB to STB nuclei in our term tissue (9:1) aligns with expectations based on stereological estimates. We believe the relatively low number of CTB nuclei in our dataset is due to the need for a larger sample size to capture more of this less abundant cell type. Since the primary focus of this paper is on STB, and we analyzed over 4,000 STB nuclei, we do not view this as a limitation. However, future studies utilizing SN to investigate term tissue should account for the abundance of STB nuclei and plan their sampling carefully to ensure sufficient representation of CTB nuclei if this is a desired focus.

      Reviewer #2 (Recommendations for the authors): 

      (1) The color annotations for cell types in Figure 2 are inconsistent between the different panels, and the term "Prolif" in Figure 2E is not explained by the authors. 

      We chose colors to enhance visibility on the UMAP. We do not wish readers to make direct comparisons between the different CTB or STB subtypes of the sample types until the datasets are integrated in Figure 2D. This is because an algorithm for the clustering resolution has been chosen independently for each dataset. Cluster proportions are better compared in the integrated datasets in Figure 2D. We have added text to the results section to make this clear to the reader (lines 166-167).

      (2) In Figure 3 and Supplementary Figures 1.3, the authors frequently present long non-coding RNA (lncRNA) signals as cell type-specific markers in the bubble plots. These signals are likely sequencing noise and may not accurately represent true markers for those cell types. It is recommended to revise this interpretation. 

      As referenced above, there are many examples of lncRNAs that have biological and pathological significance in the placenta (H19, Meg3, Meg8) and lncRNAs often have cell type specific expression that can enhance clustering. We prefer to keep these signatures included and let future researchers determine their biological significance.

      (3) In Figure 3C, the authors performed pathway enrichment analysis on the STB subtypes after integrating STB_in and STB_out organoids. The enrichment of the "transport across the blood-brain barrier" pathway in the STB-3 subtype does not align with the current understanding of STB cell function. Please provide corresponding supporting evidence. Additionally, please verify whether the other functional pathways represent functions specific to the STB subtypes. 

      Interestingly, many of the genes categorized under “transport across the blood-brain barrier” are transporters shared with “vascular transport.” These include genes involved in the transport of amino acids (SLC7A1, SLC38A1, SLC38A3, SLC7A8), molecules essential for lipid metabolism (SLC27A4, SLC44A1), and small molecule exchange (SLC4A4, SLC5A6). Given that the vasculature, the STB, and the blood-brain barrier all perform critical barrier functions, it is unsurprising that molecules associated with these GO terms are enriched in the STB-3 subtype, which expresses numerous transporter proteins. Since the transport of materials across the STB is a well-established function, we have not included additional supporting evidence but have clarified the genes associated with this GO term in the text (lines 392-394 and supplemental Table 9).

      (4) The pseudotime heatmap in Figure 4B is not properly arranged and is inconsistent with the differentiation relationships shown in Figure 4A. It is recommended to revise this. 

      We are uncertain which aspect of the heatmap in Figure 4A is perceived as inconsistent with Figure 4B. One distinction is that pseudotime in Figure 4A is normalized from 0 to 100 to fit the blue-to-yellow-to-red color scale, whereas in Figure 4B, the color scale is not normalized and the color bar ranging from white to red. This difference reflects our intent to simplify Figure 4B-C, as the abundance of color between cell types and gene expression changes required a streamlined representation to ensure the figure remained clear and easy to interpret. This is classically done in the field and consistent with the default code in the slingshot package.

      (5) In Figures 4C and 4D, although RYBP is highly expressed in STB, it is difficult to support the conclusion that RYBP shows the most significant expression changes. It is recommended to provide additional evidence. 

      The claim that RYBP exhibits the most significant expression changes was based on p-value ordering of genes associated with pseudotime via the associationTest function in slingshot and not with immunofluorescence data. The text has been revised to make this distinction clear (lines 390-393).

      (6) In Figure 4E, staining for CTB marker genes is missing, and in Figure 4F, CYTO is difficult to use as a classical STB marker. It is recommended to use the CGBs antibody from Figure 4E as a STB marker for staining to provide evidence.  

      We have revised the Figure 5B-C to use e-Cadherin as a CTB marker gene in TOs and CGB antibody as a marker of STB.

      In tissue, however, obtaining a good STB marker that does not overlap with the RYBP antibody (rabbit) in term tissue is difficult as the STB downregulates hCG expression closer to term to initiate contractions. SDC1 is often used but only labels the plasma membrane so does not help in distinguishing the STB cytoplasm. We have added an image of cytokeratin, e-Cadherin, and the STB marker ENDOU to validate that our current approach with e-Cadherin and cytokeratin allows us to accurately distinguish between CTB and STB cells.

      (7) The velocity results in Figure 5A do not align with the differentiation relationships between cells and contradict the pseudotime results presented in Figure 4 by the authors. 

      The reviewer raises an interesting observation regarding the velocity map in Figure 5A, which appears to show a bifurcation into two STB subtypes. This observation aligns with similar findings reported in tissue by our colleagues (Wang et al., 2024). However, given the low number of CTB cells in our tissue dataset, we were cautious about making definitive conclusions about pseudotime without a larger sample size. Notably, the RNA velocity map closely resembles the pseudotime trajectory in TOs, with CTB transitioning into the CTB-pf subtype and subsequently into the STB. One potential explanation for discrepancies between tissue and TOs is the difference in nuclear age: nuclei in tissue can be up to nine months old, whereas those in TOs are only hours or days old. It is possible that the lineage in TOs could bifurcate if cultured for longer than 48 hours, but our current dataset captures only the early stages of the STB differentiation process. While exploring these hypotheses is fascinating, they are beyond the scope of this current study.  

      Reviewer #3 (Recommendations for the authors): 

      Amazing work - I greatly enjoyed reading the manuscript. Here are a few questions and suggestions for consideration: 

      Evidence presented throughout the results sections hints that the organoids may represent an earlier stage of placental development compared to the term. Increased hCG gene expression is observed, but as noted expression is decreased in term STB. STB:CTB ratios are also higher at term compared to the first trimester, etc. It was difficult to conclude definitively based on how data is presented in Fig 6 and discussed. Maybe there is no clear answer. Perhaps the altered cell type ratios in the organoid models (e.g., few STB in EVT enrich conditions) impact recapitulation of the in vivo local microenvironment signaling. As such, can the authors speculate on whether cell ratios could be strategically leveraged to model different gestational time points? 

      Along these same lines, syncytiotrophoblast in early implantation (before proper villi development) is often described as invasive and later at the tertiary villi stage defined by hormone production, barrier function, and nutrient/gas exchange. Do the authors think the different STB subtypes captured in the organoid models represent different stages/functions of syncytiotrophoblast in placental development? 

      Minor Comments 

      (1) Please clarify what the third number represents in the STB:CTB ratio (e.g., 1:3:1 and 2:5:1). EVT? 

      The first number is a decimal point and not a colon (ie 1.3 and 2.5). Therefore these numbers are to be read as the STB:CTB ratio is 1.3 to 1 or 2.5 to 1.

      (2) Could consider co-localizing RYBP in term tissue with a syncytio-specific marker like CGB used for organoids (Fig 4F). 

      We addressed this concern in comment 6 to reviewer 2.

      (3) Recommend defining colors-which colors represent which module in Figure 5C in the legend and main body text. I see the labels surrounding the heatmap in 5B, but defining colors in text (e.g. cyan, magenta, etc.) would be helpful. Do the gray circles represent targets that don't belong to a specific module? Are the bolded factor names based on a certain statistical cutoff/defining criteria or were they manually selected? 

      The text of both the results and figure legends has been revised to clarify these points.

      (4) Data Availability: It would be helpful to provide supplemental table files for analyses (e.g., 5C to list the overlapping relationships in TGs for each TF/CR (5C) and 3E/6F to list DEG genes in comparisons). 

      Supplemental files for each analysis have been added (Supplemental Table 8-14). In addition, the raw and processed data is available on GEO and we have created an interactive Shiny App so people without coding experience can interact with each dataset (lines 917-919).

      (5) “...and found that each sample expressed these markers (Figure 6D), suggesting..." Consider clarifying "these". 

      Text has been added to refer to a few of these marker genes within the text (line 540).

      Citations

      (1) Zappia L, Oshlack A. Clustering trees: a visualization for evaluating clusterings at multiple resolutions. GigaScience. 2018;7(7):giy083. PMCID: PMC6057528

      (2) Zhou J, Xu J, Zhang L, Liu S, Ma Y, Wen X, Hao J, Li Z, Ni Y, Li X, Zhou F, Li Q, Wang F, Wang X, Si Y, Zhang P, Liu C, Bartolomei M, Tang F, Liu B, Yu J, Lan Y. Combined Single-Cell Profiling of lncRNAs and Functional Screening Reveals that H19 Is Pivotal for Embryonic Hematopoietic Stem Cell Development. Cell Stem Cell. 2019;24(2):285-298.e5. PMID: 30639035

      (3) Malagoli G, Valle F, Barillot E, Caselle M, Martignetti L. Identification of Interpretable Clusters and Associated Signatures in Breast Cancer Single-Cell Data: A Topic Modeling Approach. Cancers. 2024;16(7):1350. PMCID: PMC11011054

      (4) Adu-Gyamfi EA, Cheeran EA, Salamah J, Enabulele DB, Tahir A, Lee BK. Long non-coding RNAs: a summary of their roles in placenta development and pathology†. Biol Reprod. 2023;110(3):431–449. PMID: 38134961

      (5) Zheng M, Hu Y, Gou R, Nie X, Li X, Liu J, Lin B. Identification three LncRNA prognostic signature of ovarian cancer based on genome-wide copy number variation. Biomed Pharmacother. 2020;124:109810. PMID: 32000042

      (6) Khan T, Seetharam AS, Zhou J, Bivens NJ, Schust DJ, Ezashi T, Tuteja G, Roberts RM. Single Nucleus RNA Sequence (snRNAseq) Analysis of the Spectrum of Trophoblast Lineages Generated From Human Pluripotent Stem Cells in vitro. Front Cell Dev Biol. 2021;9:695248. PMCID: PMC8334858

      (7) Isakova A, Neff N, Quake SR. Single-cell quantification of a broad RNA spectrum reveals unique noncoding patterns associated with cell types and states. Proc Natl Acad Sci United States Am. 2021;118(51):e2113568118. PMCID: PMC8713755

      (8) Morales-Vicente DA, Zhao L, Silveira GO, Tahira AC, Amaral MS, Collins JJ, Verjovski-Almeida S. Singlecell RNA-seq analyses show that long non-coding RNAs are conspicuously expressed in Schistosoma mansoni gamete and tegument progenitor cell populations. Front Genet. 2022;13:924877. PMCID: PMC9531161

      (9) Kim DH, Marinov GK, Pepke S, Singer ZS, He P, Williams B, Schroth GP, Elowitz MB, Wold BJ. Single-Cell

      Transcriptome Analysis Reveals Dynamic Changes in lncRNA Expression during Reprogramming. Cell Stem Cell. 2015;16(1):88–101. PMCID: PMC4291542

      (10) Yang L, Liang P, Yang H, Coyne CB. Trophoblast organoids with physiological polarity model placental structure and function. bioRxiv. 2023;2023.01.12.523752. PMCID: PMC9882188

    1. Author response:

      General Statements

      In our manuscript, we demonstrate for the first time that RNA Polymerase I (Pol I) can prematurely release nascent transcripts at the 5' end of ribosomal DNA transcription units in vivo. This achievement was made possible by comparing wild-type Pol I with a mutant form of Pol I, hereafter called SuperPol previously isolated in our lab (Darrière at al., 2019). By combining in vivo analysis of rRNA synthesis (using pulse-labelling of nascent transcript and cross-linking of nascent transcript - CRAC) with in vitro analysis, we could show that Superpol reduced premature transcript release due to altered elongation dynamics and reduced RNA cleavage activity. Such premature release could reflect regulatory mechanisms controlling rRNA synthesis. Importantly, This increased processivity of SuperPol is correlated with resistance with BMH-21, a novel anticancer drugs inhibiting Pol I, showing the relevance of targeting Pol I during transcriptional pauses to kill cancer cells. This work offers critical insights into Pol I dynamics, rRNA transcription regulation, and implications for cancer therapeutics.

      We sincerely thank the three reviewers for their insightful comments and recognition of the strengths and weaknesses of our study. Their acknowledgment of our rigorous methodology, the relevance of our findings on rRNA transcription regulation, and the significant enzymatic properties of the SuperPol mutant is highly appreciated. We are particularly grateful for their appreciation of the potential scientific impact of this work. Additionally, we value the reviewer’s suggestion that this article could address a broad scientific community, including in transcription biology and cancer therapy research. These encouraging remarks motivate us to refine and expand upon our findings further.

      All three reviewers acknowledged the increased processivity of SuperPol compared to its wildtype counterpart. However, two out of three questions our claims that premature termination of transcription can regulate ribosomal RNA transcription. This conclusion is based on SuperPol mutant increasing rRNA production. Proving that modulation of early transcription termination is used to regulate rRNA production under physiological conditions is beyond the scope of this study. Therefore, we propose to change the title of this manuscript to focus on what we have unambiguously demonstrated:

      “Ribosomal RNA synthesis by RNA polymerase I is subjected to premature termination of transcription”.

      Reviewer 1 main criticisms centers on the use of the CRAC technique in our study. While we address this point in detail below, we would like to emphasize that, although we agree with the reviewer’s comments regarding its application to Pol II studies, by limiting contamination with mature rRNA, CRAC remains the only suitable method for studying Pol I elongation over the entire transcription units. All other methods are massively contaminated with fragments of mature RNA which prevents any quantitative analysis of read distribution within rDNA.  This perspective is widely accepted within the Pol I research community, as CRAC provides a robust approach to capturing transcriptional dynamics specific to Pol I activity. 

      We hope that these findings will resonate with the readership of your journal and contribute significantly to advancing discussions in transcription biology and related fields.

      (1) Description of the planned revisions

      Despite numerous text modification (see below), we agree that one major point of discussion is the consequence of increased processivity in SuperPol mutant on the “quality” of produced rRNA. Reviewer 3 suggested comparisons with other processive alleles, such as the rpb1-E1103G mutant of the RNAPII subunit (Malagon et al., 2006). This comparison has already been addressed by the Schneider lab (Viktorovskaya OV, Cell Rep., 2013 - PMID: 23994471), which explored Pol II (rpb1-E1103G) and Pol I (rpa190-E1224G). The rpa190-E1224G mutant revealed enhanced pausing in vitro, highlighting key differences between Pol I and Pol II catalytic ratelimiting steps (see David Schneider's review on this topic for further details).

      Reviewer 2 and 3 suggested that a decreased efficiency of cleavage upon backtracking might imply an increased error rate in SuperPol compared to the wild-type enzyme. Pol I mutant with decreased rRNA cleavage have been characterized previously, and resulted in increased errorrate. We already started to address this point. Preliminary results from in vitro experiments suggest that SuperPol mutants exhibit an elevated error rate during transcription. However, these findings remain preliminary and require further experimental validation to confirm their reproducibility and robustness. We propose to consolidate these data and incorporate into the manuscript to address this question comprehensively. This could provide valuable insights into the mechanistic differences between SuperPol and the wild-type enzyme. SuperPol is the first pol I mutant described with an increased processivity in vitro and in vivo, and we agree that this might be at the cost of a decreased fidelity.

      Regulatory aspect of the process:

      To address the reviewer’s remarks, we propose to test our model by performing experiments that would evaluate PTT levels in Pol I mutant’s or under different growth conditions. These experiments would provide crucial data to support our model, which suggests that PTT is a regulatory element of Pol I transcription. By demonstrating how PTT varies with environmental factors, we aim to strengthen the hypothesis that premature termination plays an important role in regulating Pol I activity.

      We propose revising the title and conclusions of the manuscript. The updated version will better reflect the study's focus and temper claims regarding the regulatory aspects of termination events, while maintaining the value of our proposed model.

      (2) Description of the revisions that have already been incorporated in the transferred manuscript

      Some very important modifications have now been incorporated:

      Statistical Analyses and CRAC Replicates:

      Unlike reviewers 2 and 3, reviewer 1 suggests that we did not analyze the results statistically. In fact, the CRAC analyses were conducted in biological triplicate, ensuring robustness and reproducibility. The statistical analyses are presented in Figure 2C, which highlights significant findings supporting the fact WT Pol I and SuperPol distribution profiles are different. We CRAC replicates exhibit a high correlation and we confirmed significant effect in each region of interest (5’ETS, 18S.2, 25S.1 and 3’ ETS, Figure 1) to confirm consistency across experiments. We finally took care not to overinterpret the results, maintaining a rigorous and cautious approach in our analysis to ensure accurate conclusions.

      CRAC vs. Net-seq:

      Reviewer 1 ask to comment differences between CRAC and Net-seq. Both methods complement each other but serve different purposes depending on the biological question on the context of transcription analysis. Net-seq has originally been designed for Pol II analysis. It captures nascent RNAs but does not eliminate mature ribosomal RNAs (rRNAs), leading to high levels of contamination. While this is manageable for Pol II analysis (in silico elimination of reads corresponding to rRNAs), it poses a significant problem for Pol I due to the dominance of rRNAs (60% of total RNAs in yeast), which share sequences with nascent Pol I transcripts. As a result, large Net-seq peaks are observed at mature rRNA extremities (Clarke 2018, Jacobs 2022). This limits the interpretation of the results to the short lived pre-rRNA species. In contrast, CRAC has been specifically adapted by the laboratory of David Tollervey to map Pol I distribution while minimizing contamination from mature rRNAs (The CRAC protocol used exclusively recovers RNAs with 3′ hydroxyl groups that represent endogenous 3′ ends of nascent transcripts, thus removing RNAs with 3’-Phosphate, found in mature rRNAs). This makes CRAC more suitable for studying Pol I transcription, including polymerase pausing and distribution along rDNA, providing quantitative dataset for the entire rDNA gene.

      CRAC vs. Other Methods:

      Reviewer 1 suggests using GRO-seq or TT-seq, but the experiments in Figure 2 aim to assess the distribution profile of Pol I along the rDNA, which requires a method optimized for this specific purpose. While GRO-seq and TT-seq are excellent for measuring RNA synthesis and cotranscriptional processing, they rely on Sarkosyl treatment to permeabilize cellular and nuclear membranes. Sarkosyl is known to artificially induces polymerase pausing and inhibits RNase activities which are involved in the process. To avoid these artifacts, CRAC analysis is a direct and fully in vivo approach. In CRAC experiment, cells are grown exponentially in rich media and arrested via rapid cross-linking, providing precise and artifact-free data on Pol I activity and pausing.

      Pol I ChIP Signal Comparison:

      The ChIP experiments previously published in Darrière et al. lack the statistical depth and resolution offered by our CRAC analyses. The detailed results obtained through CRAC would have been impossible to detect using classical ChIP. The current study provides a more refined and precise understanding of Pol I distribution and dynamics, highlighting the advantages of CRAC over traditional methods in addressing these complex transcriptional processes.

      BMH-21 Effects:

      As highlighted by Reviewer 1, the effects of BMH-21 observed in our study differ slightly from those reported in earlier work (Ref Schneider 2022), likely due to variations in experimental conditions, such as methodologies (CRAC vs. Net-seq), as discussed earlier. We also identified variations in the response to BMH-21 treatment associated with differences in cell growth phases and/or cell density. These factors likely contribute to the observed discrepancies, offering a potential explanation for the variations between our findings and those reported in previous studies. In our approach, we prioritized reproducibility by carefully controlling BMH-21 experimental conditions to mitigate these factors. These variables can significantly influence results, potentially leading to subtle discrepancies. Nevertheless, the overall conclusions regarding BMH-21's effects on WT Pol I are largely consistent across studies, with differences primarily observed at the nucleotide resolution. This is a strength of our CRAC-based analysis, which provides precise insights into Pol I activity.

      We will address these nuances in the revised manuscript to clarify how such differences may impact results and provide context for interpreting our findings in light of previous studies.

      Minor points:

      Reviewer #1:

      •  In general, the writing style is not clear, and there are some word mistakes or poor descriptions of the results, for example: 

      •  On page 14: "SuperPol accumulation is decreased (compared to Pol I)". 

      •  On page 16: "Compared to WT Pol I, the cumulative distribution of SuperPol is indeed shifted on the right of the graph." 

      We clarified and increased the global writing style according to reviewer comment.

      •  There are also issues with the literature, for example: Turowski et al, 2020a and Turowski et al, 2020b are the same article (preprint and peer-reviewed). Is there any reason to include both references? Please, double-check the references.  

      This was corrected in this version of the manuscript.

      •  In the manuscript, 5S rRNA is mentioned as an internal control for TMA normalisation. Why are Figure 1C data normalised to 18S rRNA instead of 5S rRNA? 

      Data are effectively normalized relative to the 5S rRNA, but the value for the 18S rRNA is arbitrarily set to 100%.

      •  Figure 4 should be a supplementary figure, and Figure 7D doesn't have a y-axis labelling. 

      The presence of all Pol I specific subunits (Rpa12, Rpa34 and Rpa49) is crucial for the enzymatic activity we performed. In the absence of these subunits (which can vary depending on the purification batch), Pol I pausing, cleavage and elongation are known to be affected. To strengthen our conclusion, we really wanted to show the subunit composition of the purified enzyme. This important control should be shown, but can indeed be shown in a supplementary figure if desired.

      Y-axis is figure 7D is now correctly labelled

      •  In Figure 7C, BMH-21 treatment causes the accumulation of ~140bp rRNA transcripts only in SuperPol-expressing cells that are Rrp6-sensitive (line 6 vs line 8), suggesting that BHM-21 treatment does affect SuperPol. Could the author comment on the interpretation of this result? 

      The 140 nt product is a degradation fragment resulting from trimming, which explains its lower accumulation in the absence of Rrp6. BMH21 significantly affects WT Pol I transcription but has also a mild effect on SuperPol transcription. As a result, the 140 nt product accumulates under these conditions.

      Reviewer #2:

      •  pp. 14-15: The authors note local differences in peak detection in the 5'-ETS among replicates, preventing a nucleotide-resolution analysis of pausing sites. Still, they report consistent global differences between wild-type and SuperPol CRAC signals in the 5'ETS (and other regions of the rDNA). These global differences are clear in the quantification shown in Figures 2B-C. A simpler statement might be less confusing, avoiding references to a "first and second set of replicates" 

      According to reviewer, statement has been simplified in this version of the manuscript.

      •  Figures 2A and 2C: Based on these data and quantification, it appears that SuperPol signals in the body and 3' end of the rDNA unit are higher than those in the wild type. This finding supports the conclusion that reduced pausing (and termination) in the 5'ETS leads to an increased Pol I signal downstream. Since the average increase in the SuperPol signal is distributed over a larger region, this might also explain why even a relatively modest decrease in 5'ETS pausing results in higher rRNA production. This point merits discussion by the authors. 

      We agree that this is a very important discussion of our results. Transcription is a very dynamic process in which paused polymerase is easily detected using the CRAC assay. Elongated polymerases are distributed over a much larger gene body, and even a small amount of polymerase detected in the gene body can represent a very large rRNA synthesis. This point is of paramount importance and, as suggested by the reviewer, is now discussed in detail.

      •  A decreased efficiency of cleavage upon backtracking might imply an increased error rate in SuperPol compared to the wild-type enzyme. Have the authors observed any evidence supporting this possibility? 

      Reviewer suggested that a decreased efficiency of cleavage upon backtracking might imply an increased error rate in SuperPol compared to the wild-type enzyme. We already started to address this point. Preliminary results from in vitro experiments suggest that SuperPol mutants exhibit an elevated error rate during transcription. However, these findings remain preliminary and require further experimental validation to confirm their reproducibility and robustness. We propose to consolidate these data and incorporate into the manuscript to address this question comprehensively.

      •  pp. 15 and 22: Premature transcription termination as a regulator of gene expression is welldocumented in yeast, with significant contributions from the Corden, Brow, Libri, and Tollervey labs. These studies should be referenced along with relevant bacterial and mammalian research. 

      According to reviewer suggestion, we referenced these studies.

      •  p. 23: "SuperPol and Rpa190-KR have a synergistic effect on BMH-21 resistance." A citation should be added for this statement. 

      This represents some unpublished data from our lab. KR and SuperPol are the only two known mutants resistant to BMH-21. We observed that resistance between both alleles is synergistic, with a much higher resistance to BMH-21 in the double mutant than in each single mutant (data not shown). Comparing their resistance mechanisms is a very important point that we could provide upon request. This was added to the statement.

      •  p. 23: "The released of the premature transcript" - this phrase contains a typo 

      This is now corrected.

      Reviewer #3:

      •  Figure 1B: it would be opportune to separate the technique's schematic representation from the actual data. Concerning the data, would the authors consider adding an experiment with rrp6D cells? Some RNAs could be degraded even in such short period of time, as even stated by the authors, so maybe an exosome depleted background could provide a more complete picture. Could also the authors explain why the increase is only observed at the level of 18S and 25S? To further prove the robustness of the Pol I TMA method could be good to add already characterized mutations or other drugs to show that the technique can readily detect also well-known and expected changes. 

      The precise objective of this experiment is to avoid the use of the Rrp6 mutant. Under these conditions, we prevent the accumulation of transcripts that would result from a maturation defect. While it is possible to conduct the experiment with the Rrp6 mutant, it would be impossible to draw reliable conclusions due to this artificial accumulation of transcripts.

      •  Figure 1C: the NTS1 probe signal is missing (it is referenced in Figure 1A but not listed in the Methods section or the oligo table). If this probe was unused, please correct Figure 1A accordingly. 

      We corrected Figure 1A.  

      •  Figure 2A: the RNAPI occupancy map by CRAC is hard to interpret. The red color (SuperPol) is stacked on top of the blue line, and we are not able to observe the signal of the WT for most of the position along the rDNA unit. It would be preferable to use some kind of opacity that allows to visualize both curves. Moreover, the analysis of the behavior of the polymerase is always restricted to the 5'ETS region in the rest of the manuscript. We are thus not able to observe whether termination events also occur in other regions of the rDNA unit. A Northern blot analysis displaying higher sizes would provide a more complete picture. 

      We addressed this point to make the figure more visually informative. In Northern Blot analysis, we use a TSS (Transcription Start Site) probe, which detects only transcripts containing the 5' extremity. Due to co-transcriptional processing, most of the rRNA undergoing transcription lacks its 5' extremity and is not detectable using this technique. We have the data, but it does not show any difference between Pol I and SuperPol. This information could be included in the supplementary data if asked.

      •  "Importantly, despite some local variations, we could reproducibly observe an increased occupancy of WT Pol I in 5'-ETS compared to SuperPol (Figure 1C)." should be Figure 2C. 

      Thanks for pointing out this mistake. it has been corrected.

      •  Figure 3D: most of the difference in the cumulative proportion of CRAC reads is observed in the region ~750 to 3000. In line with my previous point, I think it would be worth exploring also termination events beyond the 5'-ETS region. 

      We agree that such an analysis would have been interesting. However, with the exception of the pre-rRNA starting at the transcription start site (TSS) studied here, any cleaved rRNA at its 5' end could result from premature termination and/or abnormal processing events. Exploring the production of other abnormal rRNAs produced by premature termination is a project in itself, beyond this initial work aimed at demonstrating the existence of premature termination events in ribosomal RNA production.

      •  Figure 4: should probably be provided as supplementary material. 

      As l mentioned earlier (see comments), the presence of all Pol I specific subunits (Rpa12, Rpa34 and Rpa49) is crucial for the enzymatic activity we performed. This important control should be shown, but can indeed be shown in a supplementary figure if desired.

      •  "While the growth of cells expressing SuperPol appeared unaffected, the fitness of WT cells was severely reduced under the same conditions." I think the growth of cells expressing SuperPol is slightly affected. 

      We agree with this comment and we modified the text accordingly.

      •  Figure 7D: the legend of the y-axis is missing as well as the title of the plot. 

      Legend of the y-axis and title of the plot are now present.

      •  The statements concerning BMH-21, SuperPol and Rpa190-KR in the Discussion section should be removed, or data should be provided.

      This was discussed previously. See comment above.

      •  Some references are missing from the Bibliography, for example Merkl et al., 2020; Pilsl et al., 2016a, 2016b. 

      Bibliography is now fixed

      (3) Description of analyses that authors prefer not to carry out

      Does SuperPol mutant produces more functional rRNAs ?

      As Reviewer 1 requested, we agree that this point requires clarification.. In cells expressing SuperPol, a higher steady state of (pre)-rRNAs is only observed in absence of degradation machinery suggesting that overproduced rRNAs are rapidly eliminated. We know that (pre)rRNas are unable to accumulate in absence of ribosomal proteins and/or Assembly Factors (AF). In consequence, overproducing rRNAs would not be sufficient to increase ribosome content. This specific point is further address in our lab but is beyond the scope of this article.

      Is premature termination coupled with rRNA processing 

      We appreciate the reviewer’s insightful comments. The suggested experiments regarding the UTP-A complex's regulatory potential are valuable and ongoing in our lab, but they extend beyond the scope of this study and are not suitable for inclusion in the current manuscript.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):  

      Summary:  

      This study provides new insights into the role of miR-19b, an oncogenic microRNA, in the developing chicken pallium. Dynamic expression pattern of miR-19b is associated with its role in regulating cell cycle progression in neural progenitor cells. Furthermore, miR-19b is involved in determining neuronal subtypes by regulating Fezf2 expression during pallial development. These findings suggest an important role for miR-19b in the coordinated spatio-temporal regulation of neural progenitor cell dynamics and its evolutionary conservation across vertebrate species.  

      Strengths:  

      The authors identified conserved roles of miR-19 in the regulation of neural progenitor maintenance between mouse and chick, and the latter is mediated by the repression of E2f8 and NeuroD1. Furthermore, the authors found that miR-19b-dependent cell cycle regulation is tightly associated with specification of Fezf1 or Mef2c-positive neurons, in spatio-temporal manners during chicken pallial development. These findings uncovered molecular mechanisms underlying microRNA-mediated neurogenic controls.  

      Weaknesses:  

      Although the authors in this study claimed striking similarities of miR-19a/b in neurogenesis between mouse and chick pallium, a previous study by Bian et al. revealed that miR-19a contributes the expansion of radial glial cells by suppressing PTEN expression in the developing mouse neocortex, while miR-19b maintains apical progenitors via inhibiting E2f2 and NeuroD1 in chicken pallium. Thus, it is still unclear whether the orthologous microRNAs regulate common or species-specific target genes.  

      In this study, we have proposed that miR-19b regulates similar phenomena in both species using different targets, such as regulation of proliferation through PTEN in mouse and through E2f8 in the chicken.

      The spatiotemporal expression patterns of miR-19b and several genes are not convincing. For example, the authors claim that NeuroD1 is initially expressed uniformly in the subventricular zone (SVZ) but disappears in the DVR region by HH29 and becomes detectable by HH35 (Figure 1). However, the in situ hybridization data revealed that NeuroD1 is highly expressed in the SVZ of the DVR at HH29 (Figure 4F). Thus, perhaps due to the problem of immunohistochemistry, the authors have not been able to detect NeuroD1 expression in Figure 1D, and the interpretation of the data may require significant modification.  

      While Fig. 1B may suggest that NeuroD1 expression has disappeared from the DVR region by HH29, this is not true in general because we have observed NeuroD1 to be expressed in the DVR at HH29 in images of other sections. In the revised version, we will include improved images for panels of Fig. 1B which accurately show the expression pattern of NeuroD1 and miR19b at stages HH29 and HH35.  

      It seems that miR-19b is also expressed in neurons (Figure 1), suggesting the role of miR19-b must be different in progenitors and differentiated neurons. The data on the gain- and loss-offunction analysis of miR-19b on the expression of Mef2c should be carefully considered, as it is possible that these experiments disturb the neuronal functions of miR19b rather than in the progenitors.

      As pointed out by the reviewer, it is quite possible that upon manipulation of miR19b its neuronal functions are also perturbed in addition to its function in progenitor cells. After introducing gain-of-function construct in progenitor cells, we have observed changes in the morphology of these cells. These data will be included in the revised version.

      The regions of chicken pallium were not consistent among figures: in Figure 1, they showed caudal parts of the pallium (HH29 and 35), while the data in Figure 4 corresponded to the rostral part of the pallium (Figure 4B).  

      We will address this by providing images from a similar region of the pallium showing Fezf2 and Mef2c expression patterns.

      The neurons expressing Fezf2 and Mef2 in the chicken pallium are not homologous neuronal subtypes to mammalian deep and superficial cortical neurons. The authors must understand that chicken pallial development proceeds in an outside-in manner. Thus, Mef2c-postive neurons in a superficial part are early-born neurons, while FezF2-positive neurons residing in deep areas are later-born neurons. It should be noted that the expression of a single marker gene does not support cell type homology, and the authors' description "the possibility of primitive pallial lamina formation in common ancestors of birds and mammals" is misleading.  

      We appreciate this clarification and will modify or remove this statement regarding the “primitive pallial lamina formation” to avoid any confusion and misinterpretation. 

      Overexpression of CDKN1A or Sponge-19b induced ectopic expression of Fezf2 in the ventricular zone (Figure 3C, E). Do these cells maintain progenitor statement or prematurely differentiate to neurons? In addition, the authors must explain that the induction of Fezf2 is also detected in GFP-negative cells.  

      We propose to follow up on the fate of these cells by extending the observation period post-overexpression of CDKN1A or Sponge-19b to assess whether they retain progenitor characteristics or differentiate. The presence of Fezf2 in GFP-negative cells could be due to the non-cell-autonomous effects, and we will discuss this possibility in the revised manuscript.

      Reviewer #2 (Public review):  

      Summary:  

      This paper investigates the general concept that avian and mammalian pallium specifications share similar mechanisms. To explore that idea, the authors focus their attention on the role of miR-19b as a key controlling factor in the neuronal proliferation/differentiation balance. To do so, the authors checked the expression and protein level of several genes involved in neuronal differentiation, such as NeuroD1 or E2f8, genes also expressed in mammals after conducting their functional gene manipulation experiments. The work also shows a dysregulation in the number of neurons from lower and upper layers when miR-19b expression is altered.  

      To test it, the authors conducted a series of functional experiments of gain and loss of function (G&LoF) and enhancer-reporter assays. The enhancer-reporter assays demonstrate a direct relationship between miR-19b and NeuroD1 and E2f8 which is also validated by the G&LoF experiments. It´s also noteworthy to mention that the way miR-19b acts is maintaining the progenitor cells from the ventricular zone in an undifferentiated stage, thus promoting them into a stage of cellular division.  

      Overall, the paper argues that the expression of miR-19b in the ventricular zone promotes the cells in a proliferative phase and inhibits the expression of differentiation genes such as E2f8 and NeurD1. The authors claim that a decrease in the progenitor cell pool leads to an increase and decrease in neurons in the lower and upper layers, respectively.  

      Strengths:  

      (1) Novelty Contribution  

      The paper offers strong arguments to prove that the neurodevelopmental basis between mammals and birds is quite the same. Moreover, this work contributes to a better understanding of brain evolution along the animal evolutionary tree and will give us a clearer idea about the roots of how our brain has been developed. This stands in contrast to the conventional framing of mammal brain development as an independent subject unlinked to the "less evolved species". The authors also nicely show a concept that was previously restricted to mammals - the role of microRNAs in development.  

      (2) Right experimental approach  

      The authors perform a set of functional experiments correctly adjusted to answer the role of miR-19b in the control of neuronal stem cell proliferation and differentiation. Their histological, functional, and genetic approach gives us a clear idea about the relations between several genes involved in the differentiation of the neurons in the avian pallium. In this idea, they maintain the role of miR-19b as a hub controller, keeping the ventricular zone cells in an undifferentiated stage to perpetuate the cellular pool.  

      (3) Future directions  

      The findings open a door to future experiments, particularly to a better comprehension of the role of microRNAs and pallidal genetic connections. Furthermore, this work also proves the use of avians as a model to study cortical development due to the similarities with mammals.  

      Weaknesses:  

      While there are questions answered, there are still several that remain unsolved. The experiments analyzed here lead us to speculate that the early differentiation of the progenitor cells from the ventricular zone entails a reduction in the cellular pool, affecting thereafter the number of latter-born neurons (upper layers). The authors should explore that option by testing progenitor cell markers in the ventricular zone, such as Pax6. Even so, it remains possible that miR-19b is also changing the expression pattern of neurons that are going to populate the different layers, instead of their numbers, so the authors cannot rule that out or verify it. Since the paper focuses on the role of miR-19b in patterning, I think the authors should check the relationship and expression between progenitors (Pax6) and intermediate (Tbr2) cells when miR-19b is affected. Since neuronal expression markers change so fast within a few days (HH24HH35), I don't understand why the authors stop the functional experiments at different time points.  

      To address this, we will examine the expression of Pax6 and Tbr2 following both gain-of-function and loss-of-function manipulations of miR-19b. We agree with the reviewer that miR-19b may influence not only the number of neurons but also the expression pattern of neuronal markers.  Due to the limitations of our experimental design, we acknowledge that this possibility cannot be ruled out. 

      Regarding time points chosen for the functional experiments: We selected different stages based on the expression dynamics of specific markers. To detect possible ectopic induction, we analyzed developmental stages where the expression of a given marker is normally absent. Conversely, to detect loss of expression we examined stages in which the marker is typically expressed robustly. This approach allowed us to better interpret the functional consequences of miR-19b manipulation within relevant developmental windows. 

      Reviewer #3 (Public review):  

      Summary:  

      This is a timely article that focuses on the molecular machinery in charge of the proliferation of pallial neural stem cells in chicks, and aims to compare them to what is known in mammals. miR19b is related to controlling the expression of E2f8 and NeuroD1, and this leads to a proper balance of division/differentiation, required for the generation of the right number of neurons and their subtype proportions. In my opinion, many experiments do reflect an interaction between all these genes and transcription factors, which likely supports the role of miR19b in participating in the proliferation/differentiation balance.  

      Strengths:  

      Most of the methodologies employed are suitable for the research question, and present data to support their conclusions.  

      The authors were creative in their experimental design, in order to assess several aspects of pallial development.  

      Weaknesses:  

      However, there are several important issues that I think need to be addressed or clarified in order to provide a clearer main message for the article, as well as to clarify the tools employed. I consider it utterly important to review and reinterpret most of the anatomical concepts presented here. The way the are currently used is confusing and may mislead readers towards an understanding of the bird pallium that is no longer accepted by the community.  

      Major Concerns:  

      (1) Inaccurate use of neuroanatomy throughout the entire article. There are several aspects to it, that I will try to explain in the following paragraphs:  

      Figure 1 shows a dynamic and variable expression pattern of miR19b and its relation to NeuroD1. Regardless of the terms used in this figure, it shows that miR19b may be acting differently in various parts of the pallium and developmental stages. However, all the rest of the experiments in the article (except a few cases) abolish these anatomical differences. It is not clear, but it is very important, where in the pallium the experiments are performed. I refer here, at least, to Figures 2C, E, F, H, I; 3D, E; 4C, D, G, I. Regarding time, all experiments were done at HH22, and the article does not show the native expression at this stage. The sacrifice timing is variable, and this variability is not always justified. But more importantly, we don't know where those images were taken, or what part of the pallium is represented in the images. Is it always the same? Do results reflect differences between DVR and Wulst gene expression modifications? The authors should include low magnification images of the regions where experiments were performed. And they should consider the variable expression of all genes when interpreting results.  

      We agree that precise anatomical context is essential. In the revised version, we propose to: 

      a) Include schematics of the regions of interest where experimental manipulations were performed.

      b) Provide low-magnification panoramic images where appropriate, for anatomical reference.

      c) Show the expression patterns of relevant marker genes to better justify stages and region selection. 

      d) Provide the expression pattern of markers in panoramic view to show differential expression in the DVR and Wulst region and interpret our results accordingly.

      b) SVZ is not a postmitotic zone (as stated in line 123, and wrongly assigned throughout the text and figures). On the contrary, the SVZ is a secondary proliferative zone, organized in a layer, located in a basal position to the VZ. Both (VZ and SVZ) are germinative zones, containing mostly progenitors. The only postmitotic neurons in VZ and SVZ occupy them transiently when moving to the mantle zone, which is closer to the meninges and is the postmitotic territory. Please refer to the original Boulder committee articles to revise the SVZ definition. The authors, however, misinterpret this concept, and label the whole mantle zone as it this would be the SVZ. Indeed, the term "mantle zone" does not appear in the article. Please, revise and change the whole text and figures, as SVZ statements and photographs are nearly always misinterpreted. Indeed, SVZ is only labelled well in Figure 4F.  

      The two articles mentioning the expression of NeuroD1 in the SVZ (line 118) are research in Xenopus. Is there a proliferative SVZ in Xenopus?  

      For the actual existence of the SVZ in the chick pallium, please refer to the recent Rueda-Alaña et al., 2025 article that presents PH3 stainings at different timepoints and pallial areas.  

      We appreciate the correction suggested by the reviewer. In the revised manuscript: a) SVZ will be labeled correctly in all figures and descriptions b) The mantle zone terminology will be incorporated appropriately c) The two Xenopus-based references in line 118 will be removed as they are not directly relevant and d) We will refer to the Rueda-Alaña et al., (2025) to guide accurate anatomical labeling and interpretation of proliferative zones.

      We also acknowledge that while some proliferative cells exist in the SVZ of the chicken, they are relatively few and do not express typical basal progenitor markers such as Tbr2 (Nomura et al., 2016, Development). We will ensure that this nuance is clearly reflected in the text. 

      What is the Wulst, according to the authors of the article? In many figures, the Wulst includes the medial pallium and hippocampus, whereas sometimes it is used as a synonym of the hyperpallium (which excludes the medial pallium and hippocampus). Please make it clear, as the addition or not of the hippocampus definitely changes some interpretations.  

      We propose to modify the text and figures to accurately represent the correct location of the Wulst in the chick pallium.

      d) The authors compare the entirety of the chick pallium - including the hippocampus (see above), hyperpallium, mesopallium, nidopallium - to only the neocortex of mammals. This view - as shown in Suzuki et al., 2012 - forgets the specificity of pallial areas of the pallium and compares it to cortical cells. This is conceptually wrong, and leads to incorrect interpretations (please refer to Luis Puelles' commentaries on Suzuki et al results); there are incorrect conclusions about the existence of upper-layer-like and deep-layer-like neurons in the pallium of birds. The view is not only wrong according to the misinterpreted anatomical comparisons, but also according to novel scRNAseq data (Rueda-Alaña et al., 2025; Zaremba et al., 2025; Hecker et al., 2025). These articles show that many avian glutamatergic neurons of the pallium have highly diversified, and are not comparable to mammalian cortical cells. The authors should therefore avoid this incorrect use of terminology. There are not such upper-layer-like and deeplayer-like neurons in the pallium of birds.  

      We acknowledge this conceptual oversight. In the manuscript: a) We will avoid direct comparisons between the entire chick pallium and the mammalian neocortex b) Terms like “upper-layer-like” and deep-layer-like” neurons will be removed or modified d) We will cite and integrate recent findings from Rueda-Alaña et al. (2025), Zaremba et al. (2025), and Hecker et al. (2025), which provide updated insights from scRNAseq analyses into the complexity of avian pallial neurons. Cell types will be described based on marker gene expression only, without unsupported evolutionary or homology claims.

      (2) From introduction to discussion, the article uses misleading terms and outdated concepts of cell type homology and similarity between chick and pallial territories and cells. The authors must avoid this confusing terminology, as non-expert readers will come to evolutionary conclusions which are not supported by the data in this article; indeed, the article does not deal with those concepts.  

      We agree with the reviewer. In the revised version, we will remove the misleading terms and outdated concepts and avoid speculative evolutionary conclusions.  

      a) Recent articles published in Science (Rueda-Alaña et al., 2025; Zaremba et al., 2025; Hecker et al., 2025) directly contradict some views presented in this article. These articles should be presented in the introduction as they are utterly important for the subject of this article and their results should be discussed in the light of the new findings of this article. Accordingly, the authors should avoid claiming any homology that is not currently supported. The expression of a single gene is not enough anymore to claim the homology of neuronal populations.  

      In the revised version, these above-mentioned articles (Rueda-Alaña et al., 2025; Zaremba et al., 2025; Hecker et al., 2025) will be included in the introduction and discussion.  Our interpretations will be updated to reflect these new insights into neuronal diversity and regionalization in the chick pallium. 

      Auditory cortex is not an appropriate term, as there is no cortex in the pallium of birds. Cortical areas require the existence of neuronal arrangements in laminae that appear parallel to the ventricular surface. It is not the case of either hyperpallium or auditory DVR. The accepted term, according to the Avian Nomenclature forum, is Field L.  

      We will replace all instances of “auditory cortex” with “Field L”, as per the accepted terminology in the Avian Nomenclature Forum.

      c) Forebrain, a term overused in the article, is very unspecific. It includes vast areas of the brain, from the pretectum and thalamus to the olfactory bulb. However, the authors are not researching most of the forebrain here. They should be more specific throughout the text and title.  

      In the revised version, we will replace “forebrain” with “Pallium” throughout the manuscript to more accurately reflect the regions studied.

      (3) In the last part of the results, the authors claim miR19b has a role in patterning the avian pallium. What they see is that modifying its expression induces changes in gene expression in certain neurons. Accordingly, the altered neurons would differentiate into other subtypes, not similar to the wild type example. In this sense, miR19b may have a role in cell specification or neuronal differentiation. However, patterning is a different developmental event, which refers to the determination of broad genetic areas and territories. I don't think miR19b has a role in patterning.  

      We agree with the reviewers that an alteration in one marker for a particular cell type may not indicate a change in patterning. However, including the effect of miR-19b gain- and loss-of-function on Pax6 and Tbr2, may strengthen the idea that it affects patterning as suggested by reviewer #2. 

      (4) Please add a scheme of the molecules described in this article and the suggested interaction between them.  

      In the revised version, we propose to include a diagram to visually summarize the proposed interactions between miR-19b, E2f8, NeuroD1, and other key regulators.  

      (5) The methods section is way too brief to allow for repeatability of the procedures. This may be due to an editorial policy but if possible, please extend the details of the experimental procedures.  

      We will expand the Methods section to provide more detailed protocols and justifications for experimental design, in alignment with journal policy.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      In this study, the authors aim to understand the neural basis of implicit causal inference, specifically how people infer causes of illness. They use fMRI to explore whether these inferences rely on content-specific semantic networks or broader, domain-general neurocognitive mechanisms. The study explores two key hypotheses: first, that causal inferences about illness rely on semantic networks specific to living things, such as the 'animacy network,' given that illnesses affect only animate beings; and second, that there might be a common brain network supporting causal inferences across various domains, including illness, mental states, and mechanical failures. By examining these hypotheses, the authors aim to determine whether causal inferences are supported by specialized or generalized neural systems.

      The authors observed that inferring illness causes selectively engaged a portion of the precuneus (PC) associated with the semantic representation of animate entities, such as people and animals. They found no cortical areas that responded to causal inferences across different domains, including illness and mechanical failures. Based on these findings, the authors concluded that implicit causal inferences are supported by content-specific semantic networks, rather than a domain-general neural system, indicating that the neural basis of causal inference is closely tied to the semantic representation of the specific content involved.

      Strengths:

      (1) The inclusion of the four conditions in the design is well thought out, allowing for the examination of the unique contribution of causal inference of illness compared to either a different type of causal inference (mechanical) or non-causal conditions. This design also has the potential to identify regions involved in a shared representation of inference across general domains.

      (2) The presence of the three localizers for language, logic, and mentalizing, along with the selection of specific regions of interest (ROIs), such as the precuneus and anterior ventral occipitotemporal cortex (antVOTC), is a strong feature that supports a hypothesis-driven approach (although see below for a critical point related to the ROI selection).

      (3) The univariate analysis pipeline is solid and well-developed.

      (4) The statistical analyses are a particularly strong aspect of the paper.

      Weaknesses:

      Based on the current analyses, it is not yet possible to rule out the hypothesis that inferring illness causes relies on neurocognitive mechanisms that support causal inferences irrespective of their content, neither in the precuneus nor in other parts of the brain.

      (1) The authors, particularly in the multivariate analyses, do not thoroughly examine the similarity between the two conditions (illness-causal and mechanical-causal), as they are more focused on highlighting the differences between them. For instance, in the searchlight MVPA analysis, an interesting decoding analysis is conducted to identify brain regions that represent illness-causal and mechanical-causal conditions differently, yielding results consistent with the univariate analyses. However, to test for the presence of a shared network, the authors only perform the Causal vs. Non-causal analysis. This analysis is not very informative because it includes all conditions mixed together and does not clarify whether both the illness-causal and mechanical-causal conditions contribute to these results.

      (2) To address this limitation, a useful additional step would be to use as ROIs the different regions that emerged in the Causal vs. Non-causal decoding analysis and to conduct four separate decoding analyses within these specific clusters:

      (a) Illness-Causal vs. Non-causal - Illness First;

      (b) Illness-Causal vs. Non-causal - Mechanical First;

      (c) Mechanical-Causal vs. Non-causal - Illness First;

      (d) Mechanical-Causal vs. Non-causal - Mechanical First.

      This approach would allow the authors to determine whether any of these ROIs can decode both the illness-causal and mechanical-causal conditions against at least one non-causal condition.

      (3) Another possible analysis to investigate the existence of a shared network would be to run the searchlight analysis for the mechanical-causal condition versus the two non-causal conditions, as was done for the illness-causal versus non-causal conditions, and then examine the conjunction between the two. Specifically, the goal would be to identify ROIs that show significant decoding accuracy in both analyses.

      The hypothesis that a neural mechanism supports causal inference across domains predicts higher univariate responses when causal inferences occur than when they do not. This prediction was not generated by us ad hoc but rather has been made by almost all previous cognitive neuroscience papers on this topic (Ferstl & von Cramon, 2001; Satpute et al., 2005; Fugelsang & Dunbar, 2005; Kuperberg et al., 2006; Fenker et al., 2010; Kranjec et al., 2012; Pramod, Chomik-Morales, et al., 2023; Chow et al., 2008; Mason & Just, 2011; Prat et al., 2011). Contrary to this hypothesis, we find that the precuneus (PC) is most activated for illness inferences and most deactivated for mechanical inferences relative to rest, suggesting that the PC does not support domain-general causal inference. To further probe the selectivity of the PC for illness inferences, we created group overlap maps that compare PC responses to illness inferences and mechanical inferences across participants. The PC shows a strong preference for illness inferences and is therefore unlikely to support causal inferences irrespective of their content (Supplementary Figures 6 and 7). We also note that, in whole-cortex analysis, no shared regions responded more to causal inference than noncausal vignettes across domains. Therefore, the prediction made by the ‘domain-general causal engine’ proposal as it has been articulated in the literature is not supported in our data.

      Taking a multivariate approach, the hypothesis that a neural mechanism supports causal inference across domains also predicts that relevant regions can decode between all possible pairs of causal vs. noncausal conditions (e.g., Illness-Causal vs. Noncausal-Illness First, Mechanical-Causal vs. Noncausal-Illness First, etc.). The analysis described by the reviewer in (2), in which the regions that distinguish between causal vs. noncausal conditions in searchlight MVPA are used as ROIs to test various causal vs. noncausal contrasts, is non-independent. Therefore, we cannot perform this analysis. In accordance with the reviewer’s suggestions in (3), now include searchlight MVPA results for the mechanical inference condition compared to the two noncausal conditions (Supplementary Figure 9). No regions are shared across the searchlight analyses comparing all possible pairs of causal and noncausal conditions, providing further evidence that there are no shared neural responses to causal inference in our dataset.

      (4) Along the same lines, for the ROI MVPA analysis, it would be useful not only to include the illness-causal vs. mechanical-causal decoding but also to examine the illness-causal vs. non-causal conditions and the mechanical-causal vs. non-causal conditions. Additionally, it would be beneficial to report these data not just in a table (where only the mean accuracy is shown) but also using dot plots, allowing the readers to see not only the mean values but also the accuracy for each individual subject.

      We have performed these analyses and now include a table of the results as well as figures displaying the dispersion across participants (Supplementary Tables 2 and 3, Supplementary Figures 10 and 11). In the left PC, the illness inference condition was decoded from one of the noncausal conditions, and the mechanical inference condition was decoded from the same noncausal condition. The language network did not decode between any causal/noncausal pairs. In the logic network, the illness inference condition was decoded from one of the noncausal conditions, and the mechanical inference condition was decoded from the other noncausal condition. Thus, no regions showed the predicted ‘domain-general’ pattern, i.e., significant decoding between all causal/noncausal pairs. 

      Importantly, the decoding results must be interpreted in light of significant univariate differences across conditions (e.g., greater responses to illness inferences compared to noncausal vignettes in the PC). Linear classifiers are highly sensitive to univariate differences (Coutanche, 2013; Kragel et al., 2012; Hebart & Baker, 2018; Woolgar et al., 2014; Davis et al., 2014; Pakravan et al., 2022).

      (5) The selection of Regions of Interest (ROIs) is not entirely straightforward:

      In the introduction, the authors mention that recent literature identifies the precuneus (PC) as a region that responds preferentially to images and words related to living things across various tasks. While this may be accurate, we can all agree that other regions within the ventral occipital-temporal cortex also exhibit such preferences, particularly areas like the fusiform face area, the occipital face area, and the extrastriate body area. I believe that at least some parts of this network (e.g., the fusiform gyrus) should be included as ROIs in this study. This inclusion would make sense, especially because a complementary portion of the ventral stream known to prefer non-living items (i.e., anterior medial VOTC) has been selected as a control ROI to process information about the mechanical-causal condition. Given the main hypothesis of the study - that causal inferences about illness might depend on content-specific semantic representations in the 'animacy network' - it would be worthwhile to investigate these ROIs alongside the precuneus, as they may also yield interesting results.

      We thank the reviewer for their suggestion to test the FFA region. We think this provides an interesting comparison to the PC and hypothesized that, in contrast to the PC, the FFA does not encode abstract causal information about animacy-specific processes (i.e., illness). As we mention in the Introduction, although the fusiform face area (FFA) also exhibits a preference for animates, it does so primarily for images in sighted people (Kanwisher et al., 1997; Kanwisher et al., 1997; Grill-Spector et al., 2004; Noppeney et al., 2006; Konkle & Caramazza, 2013; Connolly et al., 2016; Bi et al., 2016).

      We did not select the FFA as a region of interest when preregistering the current study because we did not predict it would show sensitivity to causal knowledge. In accordance with the reviewer’s suggestions, we now include the FFA as an ROI in individual-subject univariate analysis (Supplementary Figure 8, Appendix 4). Because we did not run a separate FFA localizer task when collecting the data, we used FFA search spaces from a previous study investigating responses to face images (Julian et al., 2012). We followed the same analysis procedure that was used to investigate responses to illness inferences in the PC. Neither left nor right FFA exhibited a preference for illness inferences compared to mechanical inferences or to the noncausal conditions. This result is interesting and is now briefly discussed in the Discussion section.

      (6) Visual representation of results:

      In all the figures related to ROI analyses, only mean group values are reported (e.g., Figure 1A, Figure 3, Figure 4A, Supplementary Figure 6, Figure 7, Figure 8). To better capture the complexity of fMRI data and provide readers with a more comprehensive view of the results, it would be beneficial to include a dot plot for a specific time point in each graph. This could be a fixed time point (e.g., a certain number of seconds after stimulus presentation) or the time point showing the maximum difference between the conditions of interest. Adding this would allow for a clearer understanding of how the effect is distributed across the full sample, such as whether it is consistently present in every subject or if there is greater variability across individuals.

      We thank the reviewer for this suggestion. We now include scattered box plots displaying the dispersion in average percent signal change across participants in Figures 1, 3, and 4, and Supplementary Figures 8, 12, and 14.

      (7) Task selection:

      (a) To improve the clarity of the paper, it would be helpful to explain the rationale behind the choice of the selected task, specifically addressing: (i) why an implicit inference task was chosen instead of an explicit inference task, and (ii) why the "magic detection" task was used, as it might shift participants' attention more towards coherence, surprise, or unexpected elements rather than the inference process itself.

      (b) Additionally, the choice to include a large number of catch trials is unusual, especially since they are modeled as regressors of non-interest in the GLM. It would be beneficial to provide an explanation for this decision.

      We chose an orthogonal foil detection task, rather than an explicit causal judgment task, to investigate automatic causal inferences during reading and to unconfound such processing as much as possible from explicit decision-making processes (see Kuperberg et al., 2006 for discussion). Analogous foil detection paradigms have been used to study sentence processing and word recognition (Pallier et al., 2011; Dehaene-Lambertz et al., 2018). We now clarify this in the Introduction. The “magical” element occurred both within and across sentences so that participants could not use coherence as a cue to complete the task. Approximately 1/5 (19%) of the trials were magical catch trials to ensure that participants remained attentive throughout the experiment.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors hypothesize that "causal inferences about illness depend on content-specific semantic representations in the animacy network". They test this hypothesis in an fMRI task, by comparing brain activity elicited by participants' exposure to written situations suggesting a plausible cause of illness with brain activity in linguistically equivalent situations suggesting a plausible cause of mechanical failure or damage and non-causal situations. These contrasts identify PC as the main "culprit" in a whole-brain univariate analysis. Then the question arises of whether the content-specificity has to do with inferences about animates in general, or if there are some distinctions between reasoning about people's bodies versus mental states. To answer this question, the authors localize the mentalizing network and study the relation between brain activity elicited by Illness-Causal > Mech-Causal and Mentalizing > Physical stories. They conclude that inferring about the causes of illness partially differentiates from reasoning about people's states of mind. The authors finally test the alternative yet non-mutually exclusive hypothesis that both types of causal inferences (illness and mechanical) depend on shared neural machinery. Good candidates are language and logic, which justifies the use of a language/logic localizer. No evidence of commonalities across causal inferences versus non-causal situations is found.

      Strengths:

      (1) This study introduces a useful paradigm and well-designed set of stimuli to test for implicit causal inferences.

      (2) Another important methodological advance is the addition of physical stories to the original mentalizing protocol.

      (3) With these tools, or a variant of these tools, this study has the potential to pave the way for further investigation of naïve biology and causal inference.

      Weaknesses:

      (1) This study is missing a big-picture question. It is not clear whether the authors investigate the neural correlates of causal reasoning or of naïve biology. If the former, the choice of an orthogonal task, making causal reasoning implicit, is questionable. If the latter, the choice of mechanical and physical controls can be seen as reductive and problematic.

      We have modified the Introduction to clarify that the primary goal of the current study is to test the claim that semantic networks encode causal knowledge – in this case, causal intuitive theories of biology. Most conceptions of intuitive biology, intuitive psychology, and intuitive physics describe them as causal frameworks (e.g., Wellman & Gelman, 1992; Simons & Keil, 1995; Keil et al., 1999; Tenenbaum, Griffiths, & Niyogi, 2007; Gopnik & Wellman, 2012; Gerstenberg & Tenenbaum, 2017). As noted above, we chose an implicit task to investigate automatic causal inferences during reading and to unconfound such processing as much as possible from explicit decision-making processes. We are not sure what the reviewer means when they say that mechanical and physical controls are reductive. This is the standard control condition in neural and behavioral paradigms that investigate intuitive psychology and intuitive biology (e.g., Saxe & Kanwisher, 2003; Gelman & Wellman, 1991).

      (2) The rationale for focusing mostly on the precuneus is not clear and this choice could almost be seen as a post-hoc hypothesis.

      This study is preregistered (https://osf.io/6pnqg). The preregistration states that the precuneus is a hypothesized area of interest, so this is not a post-hoc hypothesis. Our hypothesis was informed by multiple prior studies implicating the precuneus in the semantic representation of animates (e.g., people, animals) (Fairhall & Caramazza, 2013a, 2013b; Fairhall et al., 2014; Peer et al., 2015; Wang et al., 2016; Silson et al., 2019; Rabini, Ubaldi, & Fairhall, 2021; Deen & Freiwald, 2022; Aglinskas & Fairhall, 2023; Hauptman, Elli, et al., 2025). We also conducted a pilot experiment with separate participants prior to pre-registering the study. We now clarify our rationale for focusing on the precuneus in the Introduction:

      “Illness affects living things (e.g., people and animals) rather than inanimate objects (e.g., rocks, machines, houses). Thinking about living things (animates) as opposed to non-living things (inanimate objects/places) recruits partially distinct neural systems (e.g., Warrington & Shallice, 1984; Hillis & Caramazza, 1991; Caramazza & Shelton, 1998; Farah & Rabinowitz, 2003). The precuneus (PC) is part of the ‘animacy’ semantic network and responds preferentially to living things (i.e., people and animals), whether presented as images or words (Devlin et al., 2002; Fairhall & Caramazza, 2013a, 2013b; Fairhall et al., 2014; Peer et al., 2015; Wang et al., 2016; Silson et al., 2019; Rabini, Ubaldi, & Fairhall, 2021; Deen & Freiwald, 2022; Aglinskas & Fairhall, 2023; Hauptman, Elli, et al., 2025). By contrast, parts of the visual system (e.g., fusiform face area) that respond preferentially to animates do so primarily for images (Kanwisher et al., 1997; Grill-Spector et al., 2004; Noppeney et al., 2006; Mahon et al., 2009; Konkle & Caramazza, 2013; Connolly et al., 2016; see Bi et al., 2016 for a review). We hypothesized that the PC represents causal knowledge relevant to animates and tested the prediction that it would be activated during implicit causal inferences about illness, which rely on such knowledge (preregistration: https://osf.io/6pnqg).”

      (3) The choice of an orthogonal 'magic detection' task has three problematic consequences in this study:

      (a) It differs in nature from the 'mentalizing' task that consists of evaluating a character's beliefs explicitly from the corresponding story, which complicates the study of the relation between both tasks. While the authors do not compare both tasks directly, it is unclear to what extent this intrinsic difference between implicit versus explicit judgments of people's body versus mental states could influence the results.

      (b) The extent to which the failure to find shared neural machinery between both types of inferences (illness and mechanical) can be attributed to the implicit character of the task is not clear.

      (c) The introduction of a category of non-interest that contains only 36 trials compared to 38 trials for all four categories of interest creates a design imbalance.

      We disagree with the reviewer’s argument that our use of an implicit “magic detection” task is problematic. Indeed, we think it is one of the advances of the current study over prior work.

      a) Prior work has shown that implicit mentalizing tasks (e.g., naturalistic movie watching) engages the theory of mind network, suggesting that the implicit/explicit nature of the task does not drive the activation of this network (Jacoby et al., 2016; Richardson et al., 2018). With these data in mind, it is unlikely that the implicit/explicit nature of the causal inference and theory of mind tasks in the present experiment can explain observed differences between them.

      b) Explicit causal inferences introduce a collection of executive processes that potentially confound the results and make it difficult to know whether neural signatures are related to causal inference per se. The current study focuses on the neural basis of implicit causal inference, a type of inference that is made routinely during language comprehension. We do not claim to find neural signatures of all causal inferences, we do not think any study could claim to do so because causal inferences are a highly varied class.

      c) Our findings do not exclude the possibility that content-invariant responses are elicited during explicit causality judgments. We clarify this point in the Results (e.g., “These results leave open the possibility that domain-general systems support the explicit search for causal connections”) and Discussion (e.g., “The discovery of novel causal relationships (e.g., ‘blicket detectors’; Gopnik et al., 2001) and the identification of complex causes, even in the case of illness, may depend in part on domain-general neural mechanisms”).

      d) Because the magic trials are excluded from our analyses, it is unclear how the imbalance in the number of magic trials could influence the results and our interpretation of them. We note that the number of catch trials in standard target detection paradigms are sometimes much lower than the number of target trials in each condition (e.g., Pallier et al., 2011).

      (4) Another imbalance is present in the design of this study: the number of trials per category is not the same in each run of the main task. This imbalance does not seem to be accounted for in the 1st-level GLM and renders a bit problematic the subsequent use of MVPA.

      Each condition is shown either 6 or 7 times per run (maximum difference of 1 trial between conditions), and the number of trials per condition is equal across the whole experiment: each condition is shown 7 times in two of the runs and 6 times four of the runs. This minor design imbalance is typical of fMRI experiments and should not impact our interpretations of the data, particularly because we average responses from each condition within a run before submitting them to MVPA.

      (5) The main claim of the authors, encapsulated by the title of the present manuscript, is not tested directly. While the authors included in their protocol independent localizers for mentalizing, language, and logic, they did not include an independent localizer for "animacy". As such, they cannot provide a within-subject evaluation of their claim, which is entirely based on the presence of a partial overlap in PC (which is also involved in a wide range of tasks) with previous results on animacy.

      We respectfully disagree with this assertion. Our primary analysis uses a within-subject leave-one-run-out approach. This approach allows us to use part of the data itself to localize animacy-relevant causal responses in the PC without engaging in ‘double-dipping’ or statistical non-independence (Vul & Kanwisher, 2011). We also use the mentalizing network localizer as a partial localizer for animacy. This is because the control condition (physical reasoning) does not include references to people or any animate agents (Supplementary Figures 1 and 15). We now clarify this point in Methods section of the paper (see below).

      From the Methods: “To test the relationship between neural responses to inferences about the body and the mind, and to localize animacy regions, we used a localizer task to identify the mentalizing network in each participant (Saxe & Kanwisher, 2003; Dodell-Feder et al., 2011; http://saxelab.mit.edu/use-our-efficient-false-belief-localizer)...Our physical stories incorporated more vivid descriptions of physical interactions and did not make any references to human agents, enabling us to use the mentalizing localizer as a localizer for animacy.”

      Reviewer #3 (Public review):

      Summary:

      This study employed an implicit task, showing vignettes to participants while a bold signal was acquired. The aim was to capture automatic causal inferences that emerge during language processing and comprehension. In particular, the authors compared causal inferences about illness with two control conditions, causal inferences about mechanical failures and non-causal phrases related to illnesses. All phrases that were employed described contexts with people, to avoid animacy/inanimate confound in the results. The authors had a specific hypothesis concerning the role of the precuneus (PC) in being sensitive to causal inferences about illnesses.

      These findings indicate that implicit causal inferences are facilitated by semantic networks specialized for encoding causal knowledge.

      Strengths:

      The major strength of the study is the clever design of the stimuli (which are nicely matched for a number of features) which can tease apart the role of the type of causal inference (illness-causal or mechanical-causal) and the use of two localizers (logic/language and mentalizing) to investigate the hypothesis that the language and/or logical reasoning networks preferentially respond to causal inference regardless of the content domain being tested (illnesses or mechanical).

      Weaknesses:

      I have identified the following main weaknesses:

      (1) Precuneus (PC) and Temporo-Parietal junction (TPJ) show very similar patterns of results, and the manuscript is mostly focused on PC (also the abstract). To what extent does the fact that PC and TPJ show similar trends affect the inferences we can derive from the results of the paper? I wonder whether additional analyses (connectivity?) would help provide information about this network.

      We thank the reviewer for this suggestion. While the PC shows the most robust univariate preference for illness inferences compared to both mechanical inferences and noncausal vignettes, the TPJ also shows a preference for illness inferences compared to mechanical inferences in individual-subject fROI analysis. However, as we mention in the Results section, the TPJ does not show a preference for illness inferences compared to noncausal vignettes, suggesting that the TPJ is selective for animacy but may not be as sensitive to causal knowledge about animacy-specific processes. When describing our results, we refer to the ‘animacy network’ (i.e., PC and TPJ) but also highlight that the PC exhibited the most robust responses to illness inferences (from the Results: “Inferring illness causes preferentially recruited the animacy semantic network, particularly the PC”; from the Discussion: “We find that a semantic network previously implicated in thinking about animates, particularly the precuneus (PC), is preferentially engaged when people infer causes of illness…”). We did not collect resting state data that would enable a connectivity analysis, as the reviewer suggests. This is an interesting direction for future work.

      (2) Results are mainly supported by an univariate ROI approach, and the MVPA ROI approach is performed on a subregion of one of the ROI regions (left precuneus). Results could then have a limited impact on our understanding of brain functioning.

      The original and current versions of the paper include results from multiple multivariate analyses, including whole-cortex searchlight MVPA and individual-subject fROI MVPA performed in multiple search spaces (see Supplementary Figures 10 and 11, Supplementary Tables 2 and 3).

      We note that our preregistered predictions focused primarily on univariate differences. This is because the current study investigates neural responses to inferences, and univariate increases in activity is thought to reflect the processing of such inferences. We use multivariate analyses to complement our primary univariate analyses. However, given that we observe significant univariate effects and that multivariate analyses are heavily influenced by significant univariate effects (Coutanche, 2013; Kragel et al., 2012; Hebart & Baker, 2018; Woolgar et al., 2014; Davis et al., 2014; Pakravan et al., 2022), our univariate results constitute the main findings of the paper.

      (3) In all figures: there are no measures of dispersion of the data across participants. The reader can only see aggregated (mean) data. E.g., percentage signal changes (PSC) do not report measures of dispersion of the data, nor do we have bold maps showing the overlap of the response across participants. Only in Figure 2, we see the data of 6 selected participants out of 20.

      We thank the reviewer for this suggestion. We now include graphs depicting the dispersion of the data across participants in the following figures: Figures 1, 3, and 4, and Supplementary Figures 8, 12, and 14. We have also created 2 figures that display the overlap of univariate responses across participants (Supplementary Figures 6 and 7). These figures show that there is high overlap across participants in PC responses to illness inferences but not mechanical inferences. In addition, all participants’ results from the analysis depicted in Figure 2 are included in Supplementary Figure 3. 

      (4) Sometimes acronyms are defined in the text after they appear for the first time.

      We thank the reviewer for pointing this out. We now define all acronyms before using them.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) I was unable to access the pre-registration on OSF because special permission is required.

      We apologize for this technical error. The preregistration is now publicly available: https://osf.io/6pnqg.

      (2) The length of the MRI session is quite long (around 2 hours). It is generally discouraged to have such extended data acquisition periods, as this can affect the stability and cleanliness of the data. Did you observe any effects of fatigue or attention decline in your data?

      The session was 2 hours long including 1-2 10-minute breaks. Without breaks, the scan would be approximately 1.5 hours. This is a standard length for MRI experiments. The main experiment (causal inference task) was always conducted first and lasted approximately 1 hour. Accuracy did not decrease across the 6 runs of this experiment (repeated measures ANOVA, F<sub>(5,114)</sub> = 1.35, p = .25).

      (3) The last sentence of the results states: "Although MVPA searchlight analysis identified several areas where patterns of activity distinguished between causal and non-causal vignettes, all of these regions showed a preference for non-causal vignettes in univariate analysis (Supplementary Figure 5)." This statement is not entirely accurate. As I previously pointed out, the MVPA searchlight analysis is not very informative and is difficult to interpret. However, as previously suggested, there are additional steps that could be taken to better understand and interpret these results. It is incorrect to conclude that because the brain regions identified in the MVPA analyses show a preference for non-causal vignettes in univariate analyses, the multivariate results lack value. While univariate analyses may show a preference for a specific condition, multivariate analyses can reveal more fine-grained representations of multiple conditions. For a notable example, consider the fusiform face area (FFA) that shows a clear preference for faces at the univariate level but can significantly decode other categories at the multivariate level, even when faces are not included in the analysis.

      The decoding analysis that the reviewer is suggesting for the current study would be analogous to identifying univariate differences between faces and places in the FFA and then decoding between faces and places and claiming that the FFA represents places because the decoding is significant. The decoding analyses enabled by our design are not equivalent to decoding within a condition (e.g., among face identities, among types of illness inferences), as the reviewer suggests above. It is not that such multivariate analyses “lack value” but that they recapitulate established univariate differences. Multivariate analyses are useful for revealing more fine-grained representations when i) significant univariate differences are not observed, or ii) when it is possible to decode among categories within a condition (e.g., among face identities, among types of illness inferences). We are currently collecting data that will enable us to perform within-condition decoding analyses in future work, but the design of the current study does not allow for such a comparison.

      We note that the original quotation from the manuscript has been removed because it is no longer accurate. When including participant response time as a covariate of no interest in the GLM, no regions are shared across the 4 searchlight analyses comparing causal and noncausal conditions, suggesting that there are no shared neural responses to causal inference in our dataset.

      Reviewer #2 (Recommendations for the authors):

      (1) Moderating the strength of some claims made to justify the main hypothesis (e.g., "people but not machines transmit diseases to each other through physical contact").

      We changed this wording so that it now reads: “Illness affects living things (e.g., people and animals) rather than inanimate objects (e.g., rocks, machines, houses).” (Introduction)

      (2) Expanding the paragraph introducing the sub-question about inferring people's "body states" vs "mental states". In addition, given the order in which the hypotheses are introduced, and the results are presented, I would suggest switching the order of presentation of both localizers in the methods section and adding a quick reminder of the hypotheses that justify using these localizers.

      We thank the reviewer for these suggestions. In accordance their suggestions, we have expanded the paragraph Introduction that introduces the “body states” vs. “mental states” question (see below). We have also switched the order of the localizer descriptions in the Methods section and added a sentence at the start of each section describing the relevant hypotheses (see below).

      From the Introduction: “We also compared neural responses to causal inferences about the body (i.e., illness) and inferences about the mind (i.e., mental states). Both types of inferences are about animate entities, and some developmental work suggests that children use the same set of causal principles to think about bodies and minds (Carey, 1985, 1988). Other evidence suggests that by early childhood, young children have distinct causal knowledge about the body and the mind (Springer & Keil, 1991; Callanan & Oakes, 1992; Wellman & Gelman, 1992; Inagaki & Hatano, 1993; 2004; Keil, 1994; Hickling & Wellman, 2001; Medin et al., 2010). For instance, preschoolers are more likely to view illness as a consequence of biological causes, such as contagion, rather than psychological causes, such as malicious intent (Springer & Ruckel, 1992; Raman & Winer, 2004; see also Legare & Gelman, 2008). The neural relationship between inferences about bodies and minds has not been fully described. The ‘mentalizing network’, including the PC, is engaged when people reason about agents’ beliefs (Saxe & Kanwisher, 2003; Saxe et al., 2006; Saxe & Powell, 2006; Dodell-Feder et al., 2011; Dufour et al., 2013). We localized this network in individual participants and measured its neuroanatomical relationship to the network activated by illness inferences.”

      From the Methods, localizer descriptions: “To test the relationship between neural responses to inferences about the body and the mind, and to localize animacy regions, we used a localizer task to identify the mentalizing network in each participant… To test for the presence of domain-general responses to causal inference in the language and logic networks (e.g., Kuperberg et al., 2006; Operskalski & Barbey, 2017), we used an additional localizer task to identify both networks in each participant.”

      (3) Adding a quick analysis of lateralization to support the corresponding claim of left lateralization of responses to causal inferences.

      In accordance with the reviewer’s suggestion, we now include hemisphere as a factor in all ANOVAs comparing univariate responses across conditions.

      From the Results: “In individual-subject fROI analysis (leave-one-run-out), we similarly found that inferring illness causes activated the PC more than inferring causes of mechanical breakdown (repeated measures ANOVA, condition (Illness-Causal, Mechanical-Causal) x hemisphere (left, right): main effect of condition, F<sub>(1,19)</sub> = 19.18, p < .001, main effect of hemisphere, F<sub>(1,19)</sub> = 0.3, p = .59, condition x hemisphere interaction, F<sub>(1,19)</sub> = 27.48, p < .001; Figure 1A). This effect was larger in the left than in the right PC (paired samples t-tests; left PC: t<sub>(19)</sub> = 5.36, p < .001, right PC: t<sub>(19)</sub> = 2.27, p = .04)…In contrast to the animacy-responsive PC, the anterior PPA showed the opposite pattern, responding more to mechanical inferences than illness inferences (leave-one-run-out individual-subject fROI analysis; repeated measures ANOVA, condition (Mechanical-Causal, Illness-Causal) x hemisphere (left, right): main effect of condition, F<sub>(1,19)</sub> = 17.93, p < .001, main effect of hemisphere, F<sub>(1,19)</sub> = 1.33, p = .26, condition x hemisphere interaction, F<sub>(1,19)</sub> = 7.8, p = .01; Figure 4A). This effect was significant only in the left anterior PPA (paired samples t-tests; left anterior PPA: t<sub>(19)</sub> = 4, p < .001, right anterior PPA: t<sub>(19)</sub> = 1.88, p = .08).”

      (4) Making public and accessible the pre-registration OSF link.

      We apologize for this technical error. The preregistration is now publicly available: https://osf.io/6pnqg.

      Reviewer #3 (Recommendations for the authors):

      In all figures: there are no measures of dispersion of the data across participants. The reader can only see aggregated (mean) data. E.g., percentage signal changes (PSC) do not report measures of dispersion of the data, nor do we have bold maps showing the overlap of the response across participants. Only in Figure 2, we see the data of 6 selected participants out of 20.

      We thank the reviewer for this suggestion. We now include graphs depicting the dispersion of the data across participants in the following figures: Figures 1, 3, and 4, and Supplementary Figures 8, 12, and 14. We have also created 2 figures that display the overlap of univariate responses across participants (Supplementary Figures 6 and 7). In addition, all participants’ results from the analysis depicted in Figure 2 are included in Supplementary Figure 3.

      Minor

      (1) Figure 2: Spatial dissociation between responses to illness inferences and mental state inferences in the precuneus (PC). If the analysis is the result of the MVPA, the figure should report the fact that only the left precuneus was analyzed.

      Figure 2 depicts the spatial dissociation in univariate responses to illness inferences and mental state inferences. We now clarify this in the figure legend.

      (2) VOTC and PSC acronyms are defined in the text after they appear for the first time. TPJ is never defined.

      We thank the reviewer for pointing this out. We now define all acronyms before using them.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      The paper addresses the knowledge gap between the representation of goal direction in the central complex and how motor systems stabilize movement toward that goal. The authors focused on two descending neurons, DNa01 and 02, and showed that they play different roles in steering the fly toward a goal. They also explored the connectome data to propose a model to explain how these DNs could mediate response to lateralized sensory inputs. They finally used lateralized optogenetic activation/inactivation experiments to test the roles of these neurons in mediating turnings in freely walking flies.

      Strengths:

      The experiments are well-designed and controlled. The experiment in Figure 4 is elegant, and the authors put a lot of effort into ensuring that ATP puffs do not accidentally activate the DNs. They also have explained complex experiments well. I only have minor comments for the authors.

      We are grateful for this positive feedback.

      Weaknesses:

      (1) I do not fully understand how the authors extracted the correlation functions from the population data in Figure 1. Since the ipsilateral DNs are anti-correlated with the contralateral ones, I expected that the average will drop to zero when they are pooled together (e.g., 1E-G). Of course, this will not be the case if all the data in Figure 1 are collected from the same brain hemisphere. It would be helpful if the authors could explain this.

      We regret that this information was not easy to find in our initial submission. As noted in the Figure 1D legend, Here and elsewhere, ipsi and contra are defined relative to the recorded DN(s). We have now added a sentence to the Results (right after we introduce Figure 1D) that also makes this point.

      (2) What constitutes the goal directions in Figures 1-3 and 8, as the authors could not use EPG activity as a proxy for goal directions? If these experiments were done in the dark, without landmarks, one would expect the fly's heading to drift randomly at times, and they would not engage the DNa01/02 for turning. Do the walking trajectories in these experiments qualify as menotactic bouts?

      Published work (Green et al., 2019) has shown that, even in the dark, flies will often walk for extended periods while holding the bump of EPG activity at a fixed location. During these epochs, the brain is essentially estimating that the fly is walking in a straight line in a fixed direction. (The fact that the fly is actually rotating a bit on the spherical treadmill is not something the fly can know, in the dark.) Thus, epochs where the EPG bump is held fixed are treated as menotactic bouts, even in darkness.

      Our results provide additional support for this interpretation. We find that, when flies are walking in darkness and holding the bump of EPG activity at a fixed location, they will make a corrective behavioral turning maneuver in response to an imposed bump-jump. This result argues that the flies are actually engaging in goal-directed straight-line walking, i.e. menotaxis, and it reproduces the findings of Green et al. (2019).

      To clarify this point, we have adjusted the wording of the Results pertaining to Figure 4.

      (3) In Figure 2B, the authors mentioned that DNa02 overpredicts and 01 underpredicts rapid turning and provided single examples. It would be nice to see more population-level quantification to support this claim.

      In this revision, we have reorganized Figures 1 and 2 (and associated text) to improve clarity. As part of this reorganization, we have removed this passage from the text, as it was a minor point in any event.

      Reviewer #2 (Public review):

      The data is largely electrophysiological recordings coupled with behavioral measurements (technically impressive) and some gain-of-function experiments in freely walking flies. Loss-of-function was tested but had minimal effect, which is not surprising in a system with partially redundant control mechanisms. The data is also consistent with/complementary to subsequent manuscripts (Yang 2023, Feng 2024, and Ros 2024) showing additional descending neurons with contributions to steering in walking and flying.

      The experiments are well executed, the results interesting, and the description clear. Some hypotheses based on connectome anatomy are tested: the insights on the pre-synaptic side - how sensory and central complex heading circuits converge onto these DNs are stronger than the suggestions about biomechanical mechanisms for how turning happens on the motor side.

      Of particular interest is the idea that different sensory cues can converge on a common motor program. The turn-toward or turn-away mechanism is initiated by valence rather than whether the stimulus was odor or temperature or memory of heading. The idea that animals choose a direction based on external sensory information and then maintain that direction as a heading through a more internal, goal-based memory mechanism, is interesting but it is hard to separate conclusively.

      To clarify, we mention the role of memory in connection with two places in the manuscript. First, we note that the EPG/head direction system relies on learning and memory to construct a map of directional cues in the environment. These cues are, in principle, inherently neutral, i.e. without valence. Second, we note that specific mushroom body output neurons rely on learning and memory to store the valence associated with an odor. This information is not necessarily associated with an allocentric direction: it is simply the association of odor with value. Both of these ideas are well-attested by previous work.

      The reviewer may be suggesting a sequential scheme whereby the brain initializes an allocentric goal direction based on valence, and then maintains that goal direction in memory, based on that initialization. In other words, memory is used to associate valence with some allocentric direction. This seems plausible, but it is not a claim we make in our manuscript.

      The "see-saw", where left-right symmetry is broken to allow a turn, presumably by excitation on one side and inhibition of the other leg motor modules, is interesting but not well explained here. How hyperpolarization affects motor outputs is not clear.

      We have added several sentences to the Discussion to clarify this point. According to this see-saw model, steering can emerge from right/left asymmetries in excitation, or inhibition, or both. It may be nonintuitive to think that inhibitory input to a DN can produce an action. However, this becomes more plausible given our finding that DNa02 has a relatively high basal firing rate (Fig. 1D), and DNa02 hyperpolarization is associated with contraversive turning (Fig. 5A). It is also relevant to note that there are many inhibitory cell types that form strong unilateral connections onto DNa02 (e.g., AOTU019).

      The statement near Figure 5B that "DNa02 activity was higher on the side ipsilateral to the attractive stimulus, but contralateral to the aversive stimulus" is really important - and only possible to see because of the dual recordings.

      We thank the reviewer for this positive feedback.

      Reviewer #3 (Public review):

      Summary:

      Rayshubskiy et al. performed whole-cell recordings from descending neurons (DNs) of fruit flies to characterize their role in steering. Two DNs implicated in "walking control" and "steering control" by previous studies (Namiki et al., 2018, Cande et al., 2018, Chen et al., 2018) were chosen by the authors for further characterization. In-vivo whole-cell recordings from DNa01 and DNa02 showed that their activity predicts spontaneous ipsilateral turning events. The recordings also showed that while DNa02 predicts transient turns DNa01 predicts slow sustained turns. However, optogenetic activation or inactivation showed relatively subtle phenotypes for both neurons (consistent with data in other recent preprints, Yang et al 2023 and Feng et al 2024). The authors also further characterized DNa02 with respect to its inputs and showed a functional connection with olfactory and thermosensory inputs as well as with the head-direction system. DNa01 is not characterized to this extent.

      Strengths:

      (1) In-vivo recordings and especially dual recordings are extremely challenging in Drosophila and provide a much higher resolution DN characterization than other recent studies that have relied on behavior or calcium imaging. Especially impressive are the simultaneous recordings from bilateral DNs (Figure 3). These bilateral recordings show clearly that DNa02 cells not only fire more during ipsilateral turning events but that they get inhibited during contralateral turns. In line with this observation, the difference between left and right DNa02 neuronal activity is a much better predictor of turning events compared to individual DNa02 activity.

      (2) Another technical feat in this work is driving local excitation in the head-direction neuronal ensemble

      (PEN-1 neurons), while simultaneously imaging its activity and performing whole-cell recordings from DNa02

      (Figure 4). This impressive approach provided a way to causally relate changes in the head-direction system to DNa02 activity. Indeed, DNa02 activity could predict the rate at which an artificially triggered bump in the PEN-1 ring attractor returns to its previous stable point.

      (3) The authors also support the above observations with connectomics analysis and provide circuit motifs that can explain how the head direction system (as well as external olfactory/thermal stimuli) communicated with DNa02. All these results unequivocally put DNa02 as an essential DN in steering control, both during exploratory navigation as well as stimulus-directed turns.

      We are grateful for this detailed positive feedback.

      Weaknesses:

      (1) I understand that the first version of this preprint was already on biorxiv in 2020, and some of the "weaknesses" I list are likely a reflection of the fact that I'm tasked to review this manuscript in late 2024 (more than 4 years later). But given this is a 2024 updated version it suffers from laying out the results in contemporary terms. For instance, the manuscript lacks any reference to the DNp09 circuit implicated in object-directed turning and upstream to DNa02 even though the authors cite one of the papers where this was analyzed (Braun et al, 2024). More importantly, these studies (both Braun et al 2024 and Sapkal et al 2024) along with recent work from the authors' lab (Yang et al 2023) and other labs (Feng et al 2024) provide a view that the entire suite of leg kinematics changes required for turning are orchestrated by populations of heterogeneous interconnected DNs. Moreover, these studies also show that this DN-DN network has some degree of hierarchy with some DNs being upstream to other DNs. In this contemporary view of steering control, DNa02 (like DNg13 from Yang et al 2023) is a downstream DN that is recruited by hierarchically upstream DNs like DNa03, DNp09, etc. In this view, DNa02 is likely to be involved in most turning events, but by itself unable to drive all the motor outputs required for the said events. This reasoning could be used while discussing the lack of major phenotypes with DNa02 activation or inactivation observed in the current study, which is in stark contrast to strong phenotypes observed in the case of hierarchically upstream DNs like DNp09 or DNa03. In the section, "Contributions of single descending neuron types to steering behavior": the authors start off by asking if individual DNs can make measurable contributions to steering behavior. Once more, any citations to DNp09 or DNa03 - two DNs that are clearly shown to drive strong turning-on activation (Bidaye et al, 2020, Feng et al 2024) - are lacking. Besides misleading the reader, such statements also digress the results away from contemporary knowledge in the field. I appreciate that the brief discussion in the section titled "Ensemble codes for steering" tries to cover these recent updates. However, I think this would serve a better purpose in the introduction and help guide the results.

      We apologize for these omissions of relevant citations, which we have now fixed. Specifically, in our revised Discussion, we now point out that:

      - Braun et al. (2024) reported that bilateral optogenetic activation of either DNa02 or DNa01 can drive turning (in either direction). 

      - Braun et al. (2024) also identified DNb02 as a steering-related DN.

      - Bidaye et al. (2020), Sapkal et al. (2024), and Braun et al. (2024) all contributed to the identification of DNp09 as a broadcaster DN with the capacity to promote ipsiversive turning.

      We have also revised the beginning of the Results section titled “Contributions of single descending neuron types to steering behavior”, as suggested by the Reviewer.

      Finally, we agree with the Reviewer’s overall point that steering is influenced by multiple DNs. We have not claimed that any DN is solely responsible for steering. As we note in the Discussion: “We found that optogenetically inhibiting DNa01 produced only small defects in steering, and inhibiting DNa02 did not produce statistically significant effects on steering; these results make sense if DNa02 is just one of many steering DNs.”

      (2) The second major weakness is the lack of any immunohistochemistry (IHC) images quantifying the expression of the genetic tools used in these studies. Even though the main split-Gal4 tools for DNa01 and DNa02 were previously reported by Namiki et al, 2018, it is important to document the expression with the effectors used in this work and explicitly mention the expression in any ectopic neurons. Similarly, for any experiments where drivers were combined together (double recordings, functional connectivity) or modified for stochastic expression (Figure 8), IHC images are absolutely necessary. Without this evidence, it is difficult to trust many of the results (especially in the case of behavioral experiments in Figure 8). For example, the DNa01 genetic driver used by the authors is also expressed in some neurons in the nerve cord (as shown on the Flylight webpage of Janelia Research Campus). One wonders if all or part of the results described in Figure 8 are due to DNa01 manipulation or manipulation of the nerve cord neurons. The same applies for optic lobe neurons in the DNa02 driver.

      This is a reasonable request. We used DN split-Gal4 lines to express three types of UAS-linked transgenes:

      (1) GFP

      In these flies, we know that expression in DNs is restricted to the DN types in question, based on published work (Namki et al., 2018), as well as the fact that we see one labeled DN soma per hemisphere. When we label both cells with GFP, we use the spike waveform to identify DNa02 and DNa01, as described in Figure S1

      (2) ReaChR

      In these flies, expression patterns were different in different flies because ReaChR expression was stochastically sparsened using hs-FLP. Expression was validated in each fly after the experiment, as described in the Methods (“Stochastic ReaChR expression”). hs-FLP-mediated sparsening will necessarily produce stochastic patterns of expression in both DNa02 and off-target cells, and this is true of all the flies in this experiment. What makes the “unilateral” flies distinct from the “bilateral” flies is that unilateral flies express ReaChR in one copy of DNa02, whereas bilateral flies express ReaChR in both copies of DNa02. On average, off-target expression will be the same in both groups.

      (3) GtACR1

      In these flies, we initially assumed that GtACR1 expression was the same as GFP expression under control of the same driver. However, we agree with the reviewer’s point that these two expression patterns are not necessarily identical. Therefore, to address the reviewer’s question, we performed immunofluorescence microscopy to characterize GtACR1 patterns in the brain and VNC of both genotypes. These expression patterns are now shown in a new supplemental figure (Figure S8). This figure shows that, as it happens, expression of GtACR1 is indeed indistinguishable from the GFP expression patterns for the same lines (archived on the FlyLight website). Both DN split-Gal4 lines are largely selective for the DNs in question, with limited off-target labeling. We have now drawn attention to this off-target labeling in the last paragraph of the Results, where the GtACR1 results are discussed.

      (3) The paper starts off with a comparative analysis of the roles of DNa01 and DNa02 during steering. Unfortunately, after this initial analysis, DNa01 is largely ignored for further characterization (e.g. with respect to inputs, connectomics, etc.), only to return in the final figure for behavioral characterization where DNa01 seems to have a stronger silencing phenotype compared to DNa02. I couldn't find an explanation for this imbalance in the characterization of DNa01 versus DNa02. Is this due to technical reasons? Or was it an informed decision due to some results? In addition to being a biased characterization, this also results in the manuscript lacking a coherent thread, which in turn makes it a bit inaccessible to the non-specialist.

      Yes, the first portion of the manuscript focuses on DNa01 and DNa02. The latter part of the manuscript transitions to focus mainly on DNa02. 

      Our rationale is noted at the point in the manuscript where we make this transition, with the section titled “Steering toward internal goals”: “Having identified steering-related DNs, we proceeded to investigate the brain circuits that provide input to these DNs. Here we decided to focus on DNa02, as this cell’s activity is predictive of larger steering maneuvers.” When we say that DNa02 is predictive of larger steering maneuvers, we are referring to several specific results:

      - We obtain larger filter amplitudes for DNa02 versus DNa01 (Fig. 2A-C). This means that, just after a unit change in DN firing rate, we see on average a larger change in steering velocity for DNa02 versus DNa01.

      - The linear filter for DNa02 has a higher variance explained, as compared to DNa01 (Fig. 2D). This means that DNa02 is more predictive of steering.

      - The relationship between firing rate and rotational velocity (150 ms later) is steeper for DNa02 than for DNa01 (Fig. 2G). This means that, if we ignore dynamics and we just regress firing rate against subsequent rotational velocity, we see a higher-gain relationship for DNa02.

      Our focus on DNa02 was also driven by connectivity considerations. In the same paragraph (the first paragraph in the section titled “Steering toward internal goals”). We note that “there are strong anatomical pathways from the central complex to DNa02”; the same is not true of DNa01. This point has also been noted by other investigators (Hulse et al. 2021).

      We don’t think this focus on DNa02 makes our work biased or inaccessible. Any study must balance breadth with depth. A useful general way to balance these constraints is to begin a study with a somewhat broader scope, and then narrow the study’s focus to obtain more in-depth information. Here, we began with comparative study of two cell types, and we progressed to the cell type that we found more compelling.

      (4) There seems to be a discrepancy with regard to what is emphasized in the main text and what is shown in Figures S3/S4 in relation to the role of these DNs in backward walking. There are only two sentences in the main text where these figures are cited.

      a) "DNa01 and DNa02 firing rate increases were not consistently followed by large changes in forward velocity

      (Figs. 1G and S3)."

      b) "We found that rotational velocity was consistently related to the difference in right-left firing rates (Fig. 3B). This relationship was essentially linear through its entire dynamic range, and was consistent across paired recordings (Fig. 3C). It was also consistent during backward walking, as well as forward walking (Fig. S4)." These main text sentences imply the role of the difference between left and right DNa02 in turning. However, the actual plots in the Figures S3 and S4 and their respective legends seem to imply a role in "backward walking". For instance, see this sentence from the legend of Figure S3 "When (ΔvoltageDNa02>>ΔvoltageDNa01), the fly is typically moving backward. When (firing rateDNa02>>firing rateDNa01), the fly is also often moving backward, but forward movement is still more common overall, and so the net effect is that forward velocity is small but still positive when (firing rateDNa02>>firing rateDNa01). Note that when we condition our analysis on behavior rather than neural activity, we do see that backward walking is associated with a large firing rate differential (Fig. S4)." This sort of discrepancy in what is emphasized in the text, versus what is emphasized in the figures, ends up confusing the reader. More importantly, I do not agree with any of these conclusions regarding the implication of backward walking. Both Figures S3 and S4 are riddled with caveats, misinterpretations, and small sample sizes. As a result, I actually support the authors' decision to not infer too much from these figures in the "main text". In fact, I would recommend going one step further and removing/modifying these figures to focus on the role of "rotational velocity". Please find my concerns about these two figures below:

      a) In Figures S3 and S4, every heat map has a different scale for the same parameter: forward velocity. S3A is -10 to +10mm/s. S3B is -6 to +6 S4B (left) is -12 to +12 and S4B (right) is -4 to +4. Since the authors are trying to depict results based on the color-coding this is highly problematic.

      b) Figure S3A legend "When (ΔvoltageDNa02>>ΔvoltageDNa01), the fly is typically moving backward." There are also several instances when ΔvoltageDNa02= ΔvoltageDNa01 and both are low (lower left quadrant) when the fly is typically moving backwards. So in my opinion, this figure in fact suggests DNa02 has no role in backward velocity control.

      c) Based on the example traces in S4A, every time the fly walks backwards it is also turning. Based on this it is important to show absolute rotational velocity in Figure S4C. It could be that the fly is turning around the backward peak which would change the interpretation from Figure S4C. Also, it is important to note that the backward velocities in S4A are unprecedentedly high. No previous reports show flies walking backwards at such high velocities (for example see Chen et al 2018, Nat Comm. for backward walking velocities on a similar setup).

      d) In my opinion, Figure S4D showing that right-left DNa02 correlates with rotational velocity, regardless of whether the fly is in a forward or backward walking state, is the only important and conclusive result in Figures S3/S4. These figures should be rearranged to only emphasize this panel.

      We agree that it is difficult to interpret some of the correlations between DN activity and forward velocity, given that forward velocity and rotational velocity are themselves correlated to some degree. This is why we did not make claims based on these results in the main text. In response to these comments, we have taken the Reviewer’s suggestion to preserve Figure S4D (now Figure S3). The other components of these supplemental figures have been removed.

      (5) Figure 3 shows a really nice analysis of the bilateral DNa02 recordings data. While Figure S5 [now Figure S4] shows that authors have a similar dataset for DNa01, a similar level analysis (Figures 3D, E) is not done for DNa01 data. Is there a reason why this is not done?

      The reason we did not do the same analysis for DNa01 is that we only have two paired DNa01-DNa01 recordings. It turned out to be substantially more difficult to perform DNa01-DNa01 recordings, as compared to DNa02-DNa02 recordings. For this reason, we were not able to get more than two of these recordings.

      (6) In Figure 4 since the authors have trials where bump-jump led to turning in the opposite direction to the DNa02 being recorded, I wonder if the authors could quantify hyperpolarization in DNa02 as is predicted from connectomics data in Figure 7.

      We agree this is an interesting question. However, DNa02 firing rate and membrane potential are variable, and stimulus-evoked hyperpolarizations in these DNs tend to be relatively small (on the order of 1 mV, in the case of a contralateral fictive olfactory stimulus, Figure 5A). In the case of our fictive olfactory stimuli, we could look carefully for these hyperpolarizations because we had a very large number of trials, and we could align these trials precisely to stimulus onset. By contrast, for the bump-jump experiments, we have a more limited number of trials, and turning onset is not so tightly time-locked to the chemogenetic stimuli; for these reasons, we are hesitant to make claims about any bump-jump-related hyperpolarization in these trials.

      (7) Figure 6 suggests that DNa02 contains information about latent steering drives. This is really interesting. However, in order to unequivocally claim this, a higher-resolution postural analysis might be needed. Especially given that DNa02 activation does not reliably evoke ipsilateral turning, these "latent" steering events could actually contain significant postural changes driven by DNa02 (making them "not latent"). Without this information, at least the authors need to explicitly mention this caveat.

      This is a good point. We cannot exclude the possibility that DNa02 is driving postural changes when the fly is stopped, and these postural changes are so small we cannot detect them. In this case, however, there would still be an interesting mismatch between the stimulus-evoked change in DNa02 firing rate (which is large) and the stimulus-evoked postural response (which would be very small). We have added language to the relevant Results section in order to make this explicit.

      (8) Figure 7 would really benefit from connectome data with synapse numbers (or weighted arrows) and a corresponding analysis of DNa01.

      In response to this comment, we have added synapses number information (represented by weighted arrows) to Figures 7C, E, and F. We also added information to the Methods to explain how cells were chosen for inclusion in this diagram. (In brief: we thresholded these connections so as to discard connections with small numbers of synapses.)

      We did perform an analogous connectome circuit analysis for DNa01, but if we use the same thresholds as we do for DNa02, we obtain a much sparser connectivity graph. We now show this in a new supplemental figure (Figure S9). MBON32 makes no monosynaptic connections onto DNa01, and it only forms one disynaptic connection, via LAL018, which is relatively weak. PFL3 and PFL2 make no mono- or disynaptic connections onto DNa01 comparable in strength to what we find for DNa02. 

      The sparser connectivity graph for DNa01 is partly due to the fact that fewer cell types converge onto DNa01 as compared to DNa02 (110 cell types, versus 287 cell types). Also, it seems that DNa01 is simply less closely connected to the central complex and mushroom body, as compared to DNa02.

      (9) In Figure 8E, the most obvious neuronal silencing phenotype is decreased sideways velocity in the case of DNa01 optogenetic silencing. In Figure S2, the inverse filter for sideways velocity for DNa01 had a higher amplitude than the rotational velocity filter. Taken together, does this point at some role for DNa01 in sideways velocity specifically?

      No. The forward filters describe the average velocity impulse response, given a brief step change in firing rate.

      Figure 1 and Figure S2 show that the sideways velocity forward filter is actually smaller for DNa01 than for DNa02. This means that a brief step change in DNa01 firing rate is followed by only a very small sideways velocity response. Conversely, the reverse filters describe the average firing rate impulse response, given a brief step change in sideways velocity. Figure S2 shows that the sideways velocity reverse filter is larger for DNa01 than for DNa02, but this means that the relationship between DNa01 activity and sideways velocity is so weak that we would need to see a very large neural response in order to get a brief step change in sideways velocity. In other words, the reverse filter says that DNa01 likely has very little role in determining sideways velocity.

      (10) In Figure 8G, the effect on inner hind leg stance prolongation is very weak, and given the huge sample size, hard to interpret. Also, it is not clear how this fits with the role of DNa01 in slow sustained turning based on recordings.

      Yes, this effect is small in magnitude, which is not too surprising, given that many DNs seem to be involved in the control of steering in walking. To clarify the interpretation of these phenotypes, we have added a paragraph to the end of the Results:

      “All these effects are weak, and so they should be interpreted with caution. Also, both DN split-Gal4 lines drive expression in a few off-target cell types, which is another reason for caution (Fig. S8). However, they suggest that both DNs can lengthen the stance phase of the ipsilateral back leg, which would cause ipsiversive turning. These results are also compatible with a scenario where both DNs decrease the step length in the ipsilateral legs, which would also cause ipsiversive turning. Step frequency does not normally change asymmetrically during turning, so the observed decrease in step frequency during optogenetic inhibition may just be a by-product of increasing step length when these DNs are inhibited.” We have also added caveats and clarifications in a new Discussion paragraph:

      “Our study does not fully answer the question of how these DNs affect leg kinematics, because we were not able to simultaneously measure DN activity and leg movement. However, our optogenetic experiments suggest that both DNs can lengthen the stance phase of the ipsilateral back leg (Fig. 8G), and/or  decrease the step length in the ipsilateral legs (Fig. 8H), either of which would cause ipsiversive turning. If these DNs have similar qualitative effects on leg kinematics, then why does DNa02 precede larger and more rapid steering events? This may be due to the fact that DNa02 receives stronger and more direct input from key steering circuits in the brain (Fig. S9). It may also relate to the fact that DNa02 has more direct connections onto motor neurons (Fig. 1B).”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) I found the sign conventions for rotational velocity particularly confusing. Figure 3 represents clockwise rotations as +ve values, but Figure 4H represents anticlockwise rotations as positive values. But for EPG bumps, anticlockwise rotations are given negative values. Please make them consistent unless I am missing something obvious.

      Different fields use different conventions for yaw velocity. In aeronautics, a clockwise turn is generally positive. In robotics and engineering of terrestrial vehicles, a counterclockwise turn is generally positive. Historically, most Drosophila studies that quantified rotational (yaw) velocity were focused on the behavior of flying flies, and these studies generally used the convention from aeronautics, where a clockwise turn is defined as a positive turn. When we began working in the field, we adopted this convention, in order to conform to previous literature. It might be argued that walking flies are more like robots than airplanes, but it seemed to us that it was confusing to have different conventions for different behaviors of the same animal. Thus, all of the published studies from our lab define clockwise rotation as having positive rotational velocity.

      Figure 4 focuses on the role of the central complex in steering. As the fly turns clockwise (rightward), the bump of activity in EPG neurons normally moves counterclockwise around the ellipsoid body, as viewed from the posterior side (Turner-Evans et al., 2017). The posterior view is the conventional way to represent these dynamics, because (1) we and others typically image the brain from the posterior side, not the anterior side, and (2) in a posterior view, the animal’s left is on the left side of the image, and vice versa. We have added a sentence to the Figure 4A legend to clarify these points.

      Previous work has shown that, when an experimenter artificially “jumps” the EPG bump, this causes the fly to make a compensatory turn that returns the bump to (approximately) its original location (Green et al., 2019). Our work supports this observation. Specifically, we find that clockwise bump jumps are generally followed by rightward turns (which drive the bump to return to its approximate original location via a counterclockwise path), and vice versa. This is noted in the Figure 4D legend. Note that Figure 4D plots the fly’s rotational velocity during the bump return, plotted against the initial bump jump. 

      Figure 4H shows that clockwise (blue) bump returns were typically preceded by leftward turning, counter-clockwise (green) bump returns were preceded by rightward turning, as expected. This is detailed in the Figure 4H legend, and it is consistent with the coordinate frame described above.

      (2) It would be helpful to have images of the DNa01 and DNa02 split lines used in this paper, considering this paper would most likely be used widely to describe the functions of these neurons. Similarly, images of their reconstructions would be a useful addition.

      High-quality three-dimensional confocal stacks of all the driver lines used in our study are publicly available. We have added this information to the Methods (under “Fly husbandry and genotypes”). Confocal images of the full morphologies of DNa01 and DNa02 have been previously published (Namiki et al., 2018). Figure 1A is a schematic that is intended to provide a quick visual summary of this information.

      EM reconstructions of DNa01 and DNa02 are publicly accessible in a whole-brain dataset (https://codex.flywire.ai/) and a whole-VNC dataset (https://neuprint.janelia.org/). Both datasets are referenced in our study. As these datasets are easy to search and browse via user-friendly web-based tools, we expect that interested readers will have no difficulty accessing the underlying datasets directly.

      Reviewer #2 (Recommendations for the authors):

      (1) The description of the activity of the DNs that they "PREDICT steering during walking". This is an interesting word choice. Not causes, not correlates with, not encodes... does that mean the activity always precedes the action? Does that mean when you see activity, you will get behavior? This is important for assessing whether the DN activity is a cause or an effect. It is good to be cautious but it might be worth expanding on exactly what kind of connection is implied to justify the use of the word 'predict'.

      Conventionally, “predict” means “to indicate in advance”. We write that DNs “predict” certain features of behavior. We use this term because (1) these DNs correlate with certain features of behavior, and (2) changes in DN activity precede changes in behavior.

      The notion that neurons can “predict” behavior is not original to our study. Whenever neuroscientists summarize the relationship between neural activity and behavior by fitting a mathematical model (which may be as simple as a linear regression), the fitted model can be said to represent a “prediction” of behavior. These models are evaluated by comparing their predictions with measured behaviors. A good model is predictive, but it also implies that the underlying neural signal is also predictive (Levenstein et al., 2023 Journal of Neuroscience 43: 1074-1088; DOI: 10.1523/JNEUROSCI.1179-22.2022). Here, prediction simply means correlation, without necessarily implying causation. We also use “prediction” to imply correlation.

      We do not think the term “prediction” implies determinism. Meteorologists are said to predict the weather, but it is understood that their predictions are probabilistic, not deterministic. Certainly, we would not claim that there is a deterministic relationship between DN activity and behavior. Figure 2D shows that neither DN type can explain all the variance in the fly’s rotational or sideways velocity. At the same time, both DNs have significant predictive power.

      We might equally say that these DNs “encode” behavior. We have chosen to use the word “predict” rather than “encode” because we do not think it is necessary to use the framework of symbolic communication in connection with these DNs.

      We agree with the Reviewer that it is helpful to test whether any neuron that “predicts” a behavior might also “cause” this behavior. In Figure 8, we show that directly perturbing these DNs can indeed alter locomotor behavior, which suggests a causal role. Connectome analyses also suggest a causal role for these DNs in locomotor behavior (Figure 1B, see especially also Cheong et al., 2024).

      At the same time, it is clear from our results that these DNs are not “command neurons” for turning: they do not deterministically cause turning. Therefore, to avoid misunderstanding, we have generally been careful to summarize the results of our perturbation experiments by avoiding the statement that “this DN causes this behavior”. Rather, we have generally tried to say that “this DN influences this behavior”, or “this DN promotes this behavior”.

      (2) There is some concern about how the linear filter models were developed and then used to predict the relationship between firing rate and steering behavior: how exactly were the build and test data separated to avoid re-extracting the input? It reads like a self-fulfilling prophecy/tautology.

      We used conventional cross-validation for model fitting and evaluation. We apologize that this was not made explicit in our original submission; this was due to an oversight on our part. To be clear: linear filters were computed using the data from the first 20% of a given experiment. We then convolved each cell’s firing rate estimate with the computed Neuron→Behavior filter (the “forward filter”) using the data from the final 80% of the experiment, in order to generate behavioral predictions. Thus, when a model has high variance explained, this is not attributable to overfitting: rather, it quantifies the bona fide predictive power of the model. We have added this information to the Methods (under “Data analysis - Linear filter analysis”).

      (3) Type-O right above Figure 2 [now Figure 1E]: I assume spike rate fluctuations in DNa02 precede DNa01?

      Fixed. Thank you for reading the manuscript carefully.

      (4) The description of the other manuscripts about neural control of the steering as "follow-up" papers is a bit diminishing. They were likely independent works on a similar theme that happened afterwards, rather than deliberate extensions of this paper, so "subsequent" might be a more accurate description.

      We apologize, as we did not intend this to be diminishing. Given this request, we have revised “follow-up” to “subsequent”.

      (5) The idea that DNa02 is high-gain because it is more directly connected to motor neurons is a hypothesis and this should be made clear. We really don't know the functional consequences of the directness of a path or the number of synapses, and which circuits you compare to would change this. DNa02 may be a higher gain than DNa01, but what about relative to the other DNs that enter pre-motor regions? How do you handle a few synapses and several neurons in a common class? All of these connectivity-based deductions await functional tests - like yours! I think it is better to make this clear so readers don't assume a higher level of certainty than we have.

      The Reviewer asks how we handled few-synapse connections, and how we combined neurons in the same class. We apologize for not making this explicit in our original submission. We have now added this information to the Methods. Briefly, to select cell types for inclusion in Figures 7C, we identified all individual cells postsynaptic to PFL3 and presynaptic to DNa02, discarding any unitary connections with <5 synapses. We then grouped unitary connections by cell type, and then summed all synapse numbers within each connection group (e.g., summing all synapses in all PFL3→LAL126 connections). We then discarded connection groups having <200 synapses or <1% of a cell type’s pre- or postsynaptic total. Reported connection weights are per hemisphere, i.e. half of the total within each connection group. For Figure 7F we did the same, but now discarding connection groups having <70 synapses or <0.4% of a cell type’s pre- or postsynaptic total. In Figure S9, we used the same procedures for analyzing connections onto DNa01. 

      We agree that it is tricky to infer function from connectome data, and this applies to motor neuron connectivity. We bring up DN connectivity onto motor neurons in two places. First, in the Results, we note that “steering filters (i.e., rotational and sideways velocity filters) were larger for DNa02 (Fig. 2A,B). This means that an impulse change in firing rate predicts a larger change in steering for this neuron. In other words, this result suggests that DNa02 operates with higher gain. This may be related to the fact that DNa02 makes more direct output synapses onto motor neurons (Fig. 1B) [emphasis added].” We feel this is a relatively conservative statement.

      Subsequently, in the Discussion, we ask, “why does DNa02 precede larger and more rapid steering events? This may be due to the fact that DNa02 receives stronger and more direct input from key steering circuits in the brain (Fig. S9). It may also relate to the fact that DNa02 has more direct connections onto motor neurons (Fig. 1B) [emphasis added].” Again, we feel this is a relatively conservative statement.

      To be sure, none of the motor neurons postsynaptic to DNa02 actually receive most of their synaptic input from DNa02 (or indeed any DN), and this is typical of motor neurons controlling leg muscles. Rather, leg motor neurons tend to get most of their input from interneurons rather than motor neurons (Cheong et al. 2024). Available data suggests that the walking rhythm originates with intrinsic VNC central pattern generators, and the DNs that influence walking do so, in large part, by acting on VNC interneurons. These points have been detailed in recent connectome analyses (see especially Cheong et al. 2024).

      We are reluctant to broaden the scope of our connectome analyses to include other DNs for comparison, because we think these analyses are most appropriate to full-central-nervous-system-(CNS)-connectomes (brain and VNC together), which are currently under construction. Without a full-CNS-connectome, many of the DN axons in the VNC cannot be identified. In the future, we expect that full-CNS-connectomes will allow a systematic comparison of the input and output connectivity of all DN types, and probably also the tentative identification of new steering DNs. Those future analyses should generate new hypotheses about the specializations of DNa02, DNa01, and other DNs. Our study aims to help lay a conceptual foundation for that future work.

      (6) Given the emphasis on the DNa02 to Motor Neuron connectivity shown (Figure 1B) and multiple text mentions, could you include more analyses of which motor neurons are downstream and how these might be expected to affect leg movements? I would like to see the synapse numbers (Figure 1B) as well as the fraction of total output synapses. These additions would help understand the evidence for the "see-saw" model.

      We agree this is interesting. In follow-up work from our lab (Yang et al., 2023), we describe the detailed VNC connectivity linking DNa02 to motor neurons. We refer the Reviewer specifically to Figure 7 of that study (https://www.cell.com/cell/fulltext/S0092-8674(24)00962-0).

      We regret that the see-saw model was perhaps not clear in our original submission. Briefly, this model proposes that an increase in excitatory synaptic input to one DN (and/or a disinhibition of that DN) is often accompanied by an increase in inhibitory synaptic input to the contralateral DN. This model is motivated by connectome data on the brain inputs to DNa02 (Figure 7), along with our observation that excitation of one DN is often accompanied by inhibition of the contralateral DN (Figure 5). We have now added text to the Results in several places in order to clarify these points. 

      This model specifically pertains to the brain inputs to DNs, comparing the downstream targets of these DNs in the VNC would not be a test of this hypothesis. The Reviewer may be asking to see whether there is any connectivity in the brain from one DN to its contralateral partner. We do not find connections of this sort, aside from multisynaptic connections that rely on very weak links (~10 synapses per connection). Figure 7 depicts a much stronger basis for this hypothesis, involving feedforward see-saw connections from PFL3 and MBON32. 

      (7) The conclusions from the data in Figure 8 could be explained more clearly. These seem like small effect sizes on subtle differences in leg movements - maybe like what was seen in granular control by Moonwalker's circuits? Measuring joint angles or step parameters might help clarify, but a summary description would help the reader.

      We agree that these results were not explained very well in our original submission. 

      In our revised manuscript, we have added a new paragraph to the end of this Results section providing some summary and interpretation:

      “All these effects are weak, and so they should be interpreted with caution. However, they suggest that both DNs can lengthen the stance phase of the ipsilateral back leg, which would promote ipsiversive turning. These results are also compatible with a scenario where both DNs decrease the step length in the ipsilateral legs, which would also promote ipsiversive turning. Step frequency does not normally change asymmetrically during turning, so the observed decrease in step frequency during optogenetic inhibition may just be a by-product of increasing step length when these DNs are inhibited.”

      Moreover, in the Discussion, we have also added a new paragraph that synthesizes these results with other results in our study, while also noting the limitations of our study:

      “Our study does not fully answer the question of how these DNs affect leg kinematics, because we were not able to simultaneously measure DN activity and leg movement. However, our optogenetic experiments suggest that both DNs can lengthen the stance phase of the ipsilateral back leg (Fig. 8G), and/or  decrease the step length in the ipsilateral legs (Fig. 8H), either of which would promote ipsiversive turning. If these DNs have similar qualitative effects on leg kinematics, then why does DNa02 precede larger and more rapid steering events? This may be due to the fact that DNa02 receives stronger and more direct input from key steering circuits in the brain (Fig. S9). It may also relate to the fact that DNa02 has more direct connections onto motor neurons (Fig. 1B).”

      In Figure 8D-H, we measure step parameters in freely walking flies during acute optogenetic inhibition of DNa01 and DNa02. In experiments measuring neural activity in flies walking on a spherical treadmill, we did not have a way to measure step parameters. Subsequently, this methodology was developed by Yang et al. (2023) and results for DNa02 are described in that study. 

      Reviewer #3 (Recommendations for the authors):

      Minor Points:

      (1) If space allows, actual membrane potential should be mentioned when raw recordings are shown (for example Figure 1D).

      We have now added absolute membrane potential information to Figure 1d.

      (2) Typo in the sentence "To address this issue directly, we looked closely at the timing of each cell's recruitment in our dual recordings, and found that spike rate fluctuations in DNa02 typically preceded the spike rate fluctuations in DNa02 (Fig. 2A)." The final word should be "DNa01".

      Fixed. Thank you for reading the manuscript carefully.

      (3) Figure 2A - although there aren't direct connections between a01 and a02 in the connectome, the authors never rule out functional connectivity between these two. Given a02 precedes a01, shouldn't this be addressed?

      In the full brain FAFB data set, there are two disynaptic connections from DNa02 onto the ipsilateral copy of DNa01. One connection is via CB0556 (which is GABAergic), and the other is via LAL018 (which is cholinergic). The relevant DNa02 output connections are very weak: each DNa02→CB0556 connection consists of 11 synapses, whereas each DNa02→LAL018 connection consists of 10 synapses (on average). Conversely, each CB0556→DNa01 connection consists of 29 synapses, whereas  each LAL018→DNa01 connection consists of 64 synapses. In short, LAL018 is a nontrivial source of excitatory input to DNa01, but DNa02 is not positioned to exert much influence over LAL018, and the two disynaptic connections from DNa02 onto DNa01 also have the opposite sign. Thus, it seems unlikely that DNa02 is a major driver of DNa01 activity. At the same time, it is difficult to completely exclude this possibility, because we do not understand the logic of the very complicated premotor inputs to these DNs in the brain. Thus, we are hesitant to make a strong statement on this point.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Sammons, Masserini et al. examine the connectivity of different types of CA3 pyramidal cells ("thorny" and "athorny"), and how their connectivity putatively contributes to their relative timing in sharp-wave-like activity. First, using patch-clamp recordings, they characterize the degree of connectivity within and between athorny and thorny cells. Based upon these experimental results, they compute a synaptic product matrix, and use this to inform a computational model of CA3 activity. This model finds that this differential connectivity between these populations, augmented by two different types of inhibitory neurons, can account for the relative timing of activity observed in sharp waves in vivo.

      We thank the reviewer for reading our manuscript, as well as for their nice summary and constructive comments

      Strengths:

      The patch-clamp experiments are exceptionally thorough and well done. These are very challenging experiments and the authors should be commended for their in-depth characterization of CA3 connectivity.

      Thank you for the recognition of our efforts.

      Weaknesses:

      (1) The computational elements of this study feel underdeveloped. Whereas the authors do a thorough job experimentally characterizing connections between excitatory neurons, the inhibitory neurons used in the model seem to be effectivity "fit neurons" and appear to have been tuned to produce the emergent properties of CA3 sharp wave-like activity. Although I appreciate the goal was to implicate CA3 connectivity contributions to activity timing, a stronger relationship seems like it could be examined. For example, did the authors try to "break" their model? It would be informative if they attempted different synaptic product matrices (say, the juxtaposition of their experimental product matrix) and see whether experimentally-derived sequential activity could not be elicited. It seems as though this spirit of analysis was examined in Figure 4C, but only insofar as individual connectivity parameters were changed in isolation.

      Including the two interneuron types (B and C) in the model is, on the one hand, necessary to align our modeling framework to the state-of-the-art model by Evangelista et al. (2020), which assumes that these populations act as switchers between an SPW and a non-SPW state, and on the other hand, less straightforward because the connectivity involving these interneurons is largely unknown.

      For B cells, the primary criterion to set their connections to and from excitatory cells was to balance the effect of the strong recurrent excitation and to achieve a mid-range firing rate for each population during sharp wave events. Our new simulations (Figure 5B) show that the initial suppression of population T (resulting in the long delay) indeed depends in equal proportions on the outlined excitatory connections and on how strongly each excitatory population is targeted by the B interneurons. However, these simulations demonstrate that there is a broad, clearly distinct, region of the parameter space that supports a long delay between the peaks, rather than a marginal set of finetuned parameters. In addition, the simulations show that B interneurons optimally contribute to the suppression of T when they primarily target T (Fig. 5B, panels 3,7,11,12,13) rather than A (panels 4,8,9,10,11). On the contrary, as reported in the parameter table, and now also displayed graphically in the new Figure 4A (included above, with arrow sizes proportional to the synaptic product between the parameters determining the total strength of each connection), we assume B to target A less weakly than T (to make up for the higher excitability of population A). Therefore, the long delay between the peaks in our model emerges in spite of the interneuron connectivity, rather than because of it, and it is an effect of the asymmetric connectivity between the two excitatory populations, in particular the extremely low connection from A to T.

      (2) Additional explanations of how parameters for interneurons were incorporated in the model would be very helpful. As it stands, it is difficult to understand the degree to which the parameters of these neurons are biologically constrained versus used as fit parameters to produce different time windows of activity in types of CA3 pyramidal cells.

      Response included in point (1).

      Reviewer #2 (Public Review):

      Sharp wave ripples are transient oscillations occurring in the hippocampus that are thought to play an important role in organising temporal sequences during the reactivation of neuronal activity. This study addresses the mechanism by which these temporal sequences are generated in the CA3 region focusing on two different subtypes of pyramidal neurons, thorny and athorny. Using high-quality electrophysiological recordings from up to 8 pyramidal neurons at a time the authors measure the connectivity rates between these pyramidal cell subtypes in a large dataset of 348 cells. This is a significant achievement and provides important data. The most striking finding is how similar connection characteristics are between cell types. There are no differences in synaptic strength or failure rates and some small differences in connectivity rates and short-term plasticity. Using model simulations, the authors explore the implications of the differences in connectivity rates for the temporal specificity of pyramidal cell firing within sharp-wave ripple events. The simulations show that the experimentally observed connectivity rates may contribute to the previously observed temporal sequence of pyramidal cell firing during sharp wave ripples.

      Thank you very much for your careful review of our manuscript and the overall positive assessment.

      The conclusions drawn from the simulations are not experimentally tested so remain theoretical. In the simple network model, the authors include basket cell and anti-SWR interneurons but the connectivity of these cell types is not measured experimentally and variations in interneuron parameters may also influence temporal specificity of firing.

      As variations in some of these parameters can indeed influence the temporal specificity of firing, we have now performed additional simulations, the results of which are in the new Figures 5 and S5. Please also see response to Reviewer 1, point 1.

      In addition, the influence of short-term plasticity measured in their experiments is not tested in the model.

      We have now included short-term synaptic depression in all the excitatory-to-excitatory synapses and compensated for the weakened recurrent excitation by scaling some of the other parameters. The results of re-running our simulations in this alternative version of the model are reported in Figure S3 and are qualitatively analogous to those in Figure 4.

      Interestingly, the experimental data reveal a large variability in many of the measured parameters. This may strongly influence the firing of pyramidal cells during SWRs but it is not represented within the model which uses the averaged data.

      We have now incorporated variability in the following simulation parameters: the strength and latency of the four excitatory-to-excitatory connections as well as the reversal potential and leak conductance of both types of pyramidal cells, assuming variabilities similar to those observed experimentally (see Materials and Methods for details). Upon a slight re-balancing of some inhibitory connection strengths, in order to achieve comparable firing rates, we found that this version of the model also supports the generation of sharp waves with two pyramidal components (Figure S4B), and is, thus, fully analogous to our basic model. Varying the excitatory connectivities as in the original simulations (cf. Figure 4C and Figure S4C) reveals that increasing the athorny-toathorny or decreasing the athorny-to-thorny connectivity still increases the delay between the peaks, although for some connectivity values the peak of the athorny population appears more spread out in time.

      Reviewer #3 (Public Review):

      Summary:

      The hippocampal CA3 region is generally considered to be the primary site of initiation of sharp wave ripples-highly synchronous population events involved in learning and memory although the precise mechanism remains elusive. A recent study revealed that CA3 comprises two distinct pyramidal cell populations: thorny cells that receive mossy fiber input from the dentate gyrus, and athorny cells that do not. That study also showed that it is athorny cells in particular that play a key role in sharp wave initiation. In the present work, Sammons, Masserini, and colleagues expand on this by examining the connectivity probabilities among and between thorny and athorny cells. First, using whole-cell patch clamp recordings, they find an asymmetrical connectivity pattern, with athorny cells receiving the most synaptic connections from both athorny and thorny cells, and thorny cells receiving fewer. They then demonstrate in spiking neural network simulations how this asymmetrical connectivity may underlie the preferential role of athorny cells in sharp wave initiation.

      Strengths:

      The authors provide independent validation of some of the findings by Hunt et al. (2018) concerning the distinction between thorny and athorny pyramidal cells in CA3 and advance our understanding of their differential integration in CA3 microcircuits. The properties of excitatory connections among and between thorny and athorny cells described by the authors will be key in understanding CA3 functions including, but not limited to, sharp wave initiation.

      As stated in the paper, the modeling results lend support to the idea that the increased excitatory connectivity towards athorny cells plays a key role in causing them to fire before thorny cells in sharp waves. More generally, the model adds to an expanding pool of models of sharp wave ripples which should prove useful in guiding and interpreting experimental research.

      Thank you very much for your careful review of our manuscript and this positive assessment.

      Weaknesses:

      The mechanism by which athorny cells initiate sharp waves in the model is somewhat confusingly described. As far as I understood, random fluctuations in the activities of A and B neurons provide windows of opportunity for pyramidal cells to fire if they have additionally recovered from adaptive currents. Thorny and athorny pyramidal cells are then set in a winner-takes-all competition which is quickly won by the athorny cells. The main thesis of the paper seems to be that athorny cells win this competition because they receive more inputs both from themselves and from thorny cells, hence, the connectivity "underlies the sequential activation". However, it is also stated that athorny cells activate first due to their lower rheobase and steeper f-I curve, and it is also indicated in the methods that athorny (but not thorny) cells fire in bursts. It seems that it is primarily these features that make them fire first, something which apparently happens even when the A to A connectivity is set to 0albeit with a very small lag. Perhaps the authors could further clarify the differential role of single cell and network parameters in determining the sequential activation of athorny and thorny cells. Is the role of asymmetric excitatory connectivity only to enhance the initial intrinsic advantage of athorny cells? If so, could this advantage also be enhanced in other ways?

      Thank you for the time invested in the review of our manuscript. We especially thank you for pointing out that the description of these dynamics was unclear: we have now improved it in the main text and we provide here an additional summary. As correctly highlighted by Reviewer 3, athorny neurons (A) are more excitable than thorny (T) ones due to single-neuron parameters: therefore, if there is a winner-takes-all competition, they are going to win it. Whether there is a competition in the first place, however, depends on the excitatory (and inhibitory) connections. In particular, we should distinguish two questions: does the activity of populations A and B (PV baskets), without adaptation (so at the beginning of the sharp wave) suppress T? And does the activity of populations T and B suppress A?

      The four possible combinations can be appreciated, for example, in the new Figure 5A5. If A can suppress T, but T cannot suppress A (low A-to-T, high T-to-A, bottom right corner, like in the data), A “wins” and T fires later, after a long delay. If both A and T can suppress each other (both cross-connections are low, bottom left corner), we still get the same outcome: A wins because of its earlier and sharper onset (due to single-neuron parameters). If neither population can suppress the other (high cross-connections, top right corner), then there is no competition and the populations reach the peak approximately at the same time. Only in the case in which T can suppress A, but A cannot suppress T (low T-to-A, high A-to-T, top left corner, opposite to the data), then A “loses” the competition. However, since A neurons nevertheless display some early activity (again, due to the single neuron parameters), this scenario is not as clean as the reversed one: rather, A cells have an initial, small peak, then T neurons quickly take over and grow to their own peak, and then, depending on how strongly T neurons suppress A neurons, there may or may not be a second peak for the A neurons. This is the reason why, in the top left corner of Figure 5B, the statistics show either a long positive or long negative delay, depending on whether the first (small) or second (absent, for some parameters) peak of A is taken into account. In summary, the experimentally measured connectivity does not only enhance the initial intrinsic advantage of A cells, but sets up the competitive dynamics in the first place, which are crucial for the emergence of two distinct peaks, rather than a single peak involving both populations.

      Although a clear effort has been made to constrain the model with biological data, too many degrees of freedom remain that allow the modeler to make arbitrary decisions. This is not a problem in itself, but perhaps the authors could explain more of their reasoning and expand upon the differences between their modeling choices and those of others. For example, what are the conceptual or practical advantages of using adaptation in pyramidal neurons as opposed to short-term synaptic plasticity as in the model by Hunt et al.?

      It should be pointed out that the model by Hunt et al. features adaptation in pyramidal neurons as well, as the neuronal units employed are also adaptive-exponential integrate-and-fire. In an early stage of this project, we obtained from Hunt et al. the code for their model, and ascertained that adaptation is the main mechanism governing the alternations between the sharp-wave and the non-sharp-wave states, to the extent that fully removing short-term plasticity from their model does not have any significant impact on the network dynamics. Therefore, our choices are, in this regard, fully consistent with theirs. In order to confirm that synaptic depression does not significantly impact the dynamics also in our model, we now performed additional simulations (Figure S3), addressed in the main text (lines 149-151) and in the response to Reviewer 1, who expressed similar concerns.

      Relatedly, what experimental observations could validate or falsify the proposed mechanisms?

      As sharp wave generation in this model relies on disinhibitory dynamics (suppression of the anti-sharp-wave interneurons C), the model could be validated/falsified by proving/disproving that a class of interneurons with anti-sharp-wave features exists. In addition, the mechanism we proposed for the long delay between the peaks of the athorny and thorny activity requires at least some connectivity from athorny to basket and from basket to thorny neurons.

      In the data by Hunt et al., thorny cells have a higher baseline (non-SPW) firing rate, and it is claimed that it is actually stochastic correlations in their firing that are amplified by athorny cells to initiate sharp waves. However, in the current model, the firing of both types of pyramidal cells outside of ripples appears to be essentially zero. Can the model handle more realistic firing rates as described by Hunt et al., or as produced by e.g., walking around an environment tiled with place cells, or would that trigger SPWs continuously?

      When building this model, we aimed at having two clearly distinct states the network could alternate between, so we picked a rather polarized connectivity to and from the anti-sharp wave cells (C), resulting in polarized states. As a result, we obtain a low, although non-zero, activity of pyramidal neurons in non-SPW states (0.4 spikes/s for athorny and 0.2 spikes/s for thorny). These assumptions can be partially relaxed, for example in the original model by Evangelista et al. (2020), where the background firing rate of pyramidal cells is ~2 spikes/s. It should also be noted that, when walking in an environment tiled with place cells, the hippocampus is subject to additional extra-hippocampal inputs (e.g. from the medial septum, resulting in theta oscillations) and to neuromodulation, which can alter the network in various ways that we have not included in our model. However, our results are not in contradiction to transient SPW-like activity states initiated at a certain phase of the theta oscillation, when the inhibition is weakest.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The manuscript reads like it was intended as a short-form manuscript for another journal. The introduction and discussion in particular are very brief and would benefit from being expanded and providing a bigger picture for the reader.

      We had originally aimed to submit in the eLife “short report” format. However, also thanks to the suggestion of Reviewer 1, we realized that our text would be better supported by extended introduction and discussion sections, as well as additional figures.

      (2) Graphs would benefit from including all datapoints, where appropriate.

      All datapoints have now been added to boxplots in the main figures and supplement.

      (3) The panels of Figure 4 are laid out strangely, it may be worthwhile to adjust.

      We thank the reviewer for this suggestion. We have now adjusted the layout of Figure 4 and believe it is now easier to follow.

      Reviewer #2 (Recommendations For The Authors):

      Useful points to address include:

      (1) Explore within the model the effect of altering interneuron connectivity. Are there other factors that can influence temporal specificity within SWRs?

      The effects of varying the connectivity to and from B interneurons (the ones which are SPWactive and therefore relevant for temporal specificity) have now been investigated in the new Figure 5B, in which such parameters were varied in pairs or combined with the two most relevant excitatoryto-excitatory connections.

      (2) Implement the experimentally observed short-term plasticity in the model to determine how this influences temporal specificity.

      All the findings in Figure 4 have now been replicated in the new Figure S3, in which excitatory-to-excitatory synapses feature synaptic depression.

      (3) Consider if it is possible to incorporate observed experimental variability in the model and explore the implications.

      All the findings in Figure 4 have now been replicated in the new Figure S4, in which heterogeneity has been introduced in multiple neuronal and synaptic parameters of thorny and athorny neurons.

      (4) Include the co-connectivity rates in the data. Ie how many of the recorded neurons are reciprocally connected? Does this change the model simulations?

      We have now added the rates of reciprocal connections that we observed into the main text (lines 8688). We found 2 pairs of reciprocally connected athorny neurons and 2 pairs of reciprocally connected thorny neurons. These rates of reciprocity were not statistically significant. We did not observe reciprocal connections in other paired neuron combinations (i.e. athorny-thorny or vice-versa). Coconnectivity does not have any effect on the model simulations, as the model includes thousands of neurons grouped in populations without specific sub-structures. It might, however, be more relevant if the excitatory populations were further subdivided in assemblies.

      Reviewer #3 (Recommendations For The Authors):

      (1) Specify which part of CA3 you are recording from.

      We have added this information into our results section - we recorded from 20 cells in CA3a, 274 cells in CA3b and 54 cells in CA3c. This information can now be found in the text on lines 68-69.

      (2) Comment on why you might observe a larger fraction of athorny cells than Hunt et al.

      Hunt et al. cite a broad range for the fraction of athorny cells in their discussion (10-20%). It is unclear where these estimates originate from. In their study, Hunt et al. use the bursting and nonbursting phenotypes as proxies for athorny and thorny cells respectively, and report here numbers of 32 and 70 equating to 31% athorny and 69% thorny. This fraction of athorny cells is more or less in line with our own findings, albeit slightly lower (34% and 66%). However, we believe this difference falls within the range of experimental variability. One caveat is that our electrophysiological recordings likely represent a biased sample of cells. In particular, with multipatch recordings, placement of later electrodes is often restricted to the borders of the pyramidal layer so as not to disturb already patched cells. Thus, our recorded cells do not represent a fully random sample of CA3 pyramidal cells. We believe that, only once a reliable genetic marker for athorny cells has been established can the size of this cell population be properly estimated. Furthermore, the ratio of thorny and athorny cells varies along the proximal distal axis of the CA3 so differences in ratios seen between our study and Hunt et al. may arise from sampling differences along this axis.

      (3) In Figure 3, Aiii (the cell fractions) could also be represented as a vector of two squares stacked one on top of the other, then you could add multiplication signs between Ai, Aii and Aiii, and an equal sign between Aiii and Aiv.

      Thank you! We have implemented this very nice suggestion.

      (4) In Figure 4A, it would be helpful to display the strength of the connections similar to how it is done in Figure 3B.

      We thank the reviewer for this suggestion. We have now updated Fig 4A to include connection strengths.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Cognitive and brain development during the first two years of life is vast and determinant for later development. However, longitudinal infant studies are complicated and restricted to occidental high-income countries. This study uses fNIRS to investigate the developmental trajectories of functional connectivity networks in infants from a rural community in Gambia. In addition to resting-state data collected from 5 to 24 months, the authors collected growing measures from birth until 24 months and administrated an executive functioning task at 3 or 5 years old.

      The results show left and right frontal-middle and right frontal-posterior negative connections at 5 months that increase with age (i.e., become less negative). Interestingly, contrary to previous findings in high-income countries, there was a decrease in frontal interhemispheric connectivity. Restricted growth during the first months of life was associated with stronger frontal interhemispheric connectivity and weaker right frontal-posterior connectivity at 24 months. Additionally, the study describes that some connectivity patterns related to better cognitive flexibility at pre-school age.

      Strengths:

      - The authors analyze data from 204 infants from a rural area of Gambia, already a big sample for most infant studies. The study might encourage more research on different underrepresented infant populations (i.e., infants not living in occidental high-income countries).

      - The study shows that fNIRS is a feasible instrument to investigate cognitive development when access to fMRI is not possible or outside a lab setting.

      - The fNIRS data preprocessing and analysis are well-planned, implemented, and carefully described. For example, the authors report how the choices in the parameters for the motion artifacts detection algorithm affect data rejection and show how connectivity stability varies with the length of the data segment to justify the threshold of at least 250 seconds free of artifacts for inclusion.

      - The authors use proper statistical methods for analysis, considering the complexity of the dataset.

      We thank the reviewer for highlighting the strengths of this work.

      Weaknesses:

      - No co-registration of the optodes is implemented. The authors checked for correct placement by looking at pictures taken during the testing session. However, head shape and size differences might affect the results, especially considering that the study involves infants from 5 months to 24 months and that the same fNIRS array was used at all ages.

      The fNIRS array used in this work was co-registered onto age-appropriate MNI templates at every time point in a previous published work L. H. Collins-Jones, et al., Longitudinal infant fNIRS channel-space analyses are robust to variability parameters at the group-level: An image reconstruction investigation. Neuroimage 237, 118068 (2021). This is reference No. 68 in the manuscript.

      As we mentioned in the section fNIRS preprocessing and data-analysis: ‘The sections were established via the 17 channels of each hemisphere which were grouped into front, middle and back (for a total of six regions) based on a previous co-registration of the BRIGHT fNIRS arrays onto age-appropriate templates’. The procedure mentioned by the reviewer, involving the examination of pictures showing the placement of headbands on participants, aimed to exclude infants with excessive cap displacement from further analysis.

      - The authors regress the global signal to remove systemic physiological noise. While the authors also report the changes in connectivity without global signal regression, there are some critical differences. In particular, the apparent decrease in frontal inter-hemispheric connections is not present when global signal regression is omitted, even though it is present for deoxy-Hb. The authors use connectivity results obtained after applying global signal regression for further analysis. The choice of regressing the global signal is questionable since it has been shown to introduce anti-correlations in fMRI data (Murphy et al., 2009), and fNIRS in young infants does not seem to be highly affected by physiological noise (Emberson et al., 2016). Systemic physiological noise might change at different ages, which makes its remotion critical to investigate functional network development. However, global signal regression might also affect the data differently. The study would have benefited from having short separation channels to measure the systemic psychological component in the data.

      The work of Emberson et. al (2016) mentioned by the reviewer highlights indeed the challenges of removing systemic changes from the infants’ haemodynamic signal with short-channel separation (SSC). In fact, even a SSC of 1 cm detected changes in the blood in the brain, therefore by regressing this signal from the recorded one, the authors removed both systemic changes AND haemodynamic signal. This paper from Emberson et. al (2016) is taken as a reference in the field to suggest that SSC might not be an ideal tool to remove systemic changes when collecting fNIRS data on young infants, as we did in this work.

      We agree with the reviewer's observation that systemic physiological noise may vary with age and among infants. Therefore, for each infant at each age, we regressed the mean value calculated across all channels. This ensures that the regressed signal is not biased by averaged calculations at group levels.

      We are aware of the criticisms directed towards global signal regression in the fMRI literature, although some other works showed anticorrelations in functional connectivity networks both with and without global signal regression (Chaia, 2012). Furthermore, Murphy himself revised his criticism on the use of global signal regression in functional connectivity analysis in one of his more recent works (Murphy et al, 2017). The fact that the decreased FC is significant in results from data pre-processed without global signal regression gives us confidence that this finding is statistically robust and not solely driven by this preprocessing choice in our pipeline.

      An interesting study by Abdalmalak et al. (2022) demonstrated that failing to correct for systemic changes using any method is inappropriate when estimating FC with fNIRS, as it can lead to a high risk of elevated connectivity across the whole brain (see Figure 4 of the mentioned paper). Consequently, we strongly advocate for the implementation of global signal regression in our analysis pipeline as a fundamental step for accurate functional connectivity estimations.

      References:

      Emberson, L. L., Crosswhite, S. L., Goodwin, J. R., Berger, A. J., & Aslin, R. N. (2016). Isolating the effects of surface vasculature in infant neuroimaging using short-distance optical channels: a combination of local and global effects. Neurophotonics, 3(3), 031406-031406.

      Chaia, X. J., Castañóna, A. N., Öngürb, D., & Whitfield-Gabrielia, S. (2012). Anticorrelations in resting state networks without global signal regression. NeuroImage, 59(2), 1420–1428. https://doi.org/10.1515/9783050076010-014

      Murphy, K., & Fox, M. D. (2017). Towards a consensus regarding global signal regression for resting state functional connectivity MRI. NeuroImage, 154(November 2016), 169–173. https://doi.org/10.1016/j.neuroimage.2016.11.052

      Abdalmalak, A., Novi, S. L., Kazazian, K., Norton, L., Benaglia, T., Slessarev, M., ... & Owen, A. M. (2022). Effects of systemic physiology on mapping resting-state networks using functional near-infrared spectroscopy. Frontiers in neuroscience, 16, 803297.

      - I believe the authors bypass a fundamental point in their framing. When discussing the results, the authors compare the developmental trajectories of the infants tested in a rural area of Gambia with the trajectories reported in previous studies on infants growing in occidental high-income countries (likely in urban contexts) and attribute the differences to adverse effects (i.e., nutritional deficits). Differences in developmental trajectories might also derive from other environmental and cultural differences that do not necessarily lead to poor cognitive development.

      We agree with the reviewer that other factors differing between low- and poor-resource settings might have an impact on FC trajectories. We therefore specified this in the discussion as follows: “We acknowledge that differences in FC could also be attributed to other environmental and cultural disparities between high-resource and low-resource settings, and future studies are needed to investigate this further” (line 238).

      - While the study provides a solid description of the functional connectivity changes in the first two years of life at the group level, the evidence regarding the links between adverse situations, developmental trajectories, and later cognitive capacities is weaker. The authors find that early restricted growth predicts specific connectivity patterns at 24 months and that certain connectivity patterns at specific ages predict cognitive flexibility. However, the link between development trajectories (individual changes in connectivity) with growth and later cognitive capacities is missing. To address this question adequately, the study should have compared infants with different growing profiles or those who suffered or did not from undernutrition. However, as the authors discussed, they lacked statistical power.

      We agree with the reviewer, and indeed we highlighted this as one of the main limitation of our work: “Even given the large sample in our study, we were underpowered to test for group comparisons between sets of infants with distinct undernutrition growth profiles, e.g., infants with early poor growth that later resolved and infants with standard growth early that had a poor growth later. We were also underpowered to test the associations between early growth and FC on clinically undernourished infants (defined as having DWLZ two standard deviations below the mean) (line 311, discussion section).

      We believe this is an important point to consider for the field, as it addresses the sample size required for studies investigating brain development in clinically malnourished infants. We hope this will serve as a valuable reference for future studies in the field. For example, a new study led by Prof. Sophie Moore and other members of the BRIGHT team (INDiGO) is currently recruiting six-hundreds pregnant women with the aim of obtaining a broader distribution of infants’ growth measures (https://www.kcl.ac.uk/research/sophie-moore-research-group).

      Reviewer #2 (Public Review):

      Summary and strengths:

      The article pertains to a topic of importance, specifically early life growth faltering, a marker of undernutrition, and how it influences brain functional connectivity and cognitive development. In addition, the data collection was laborious, and data preprocessing was quite rigorous to ensure data quality, utilizing cutting-edge preprocessing methods.

      We thank the reviewer for highlighting the strengths of this work.

      Weaknesses:

      However, the subsequent analysis and explanations were not very thorough, which made some results and conclusions less convincing. For example, corrections for multiple tests need to be consistently maintained; if the results do not survive multiple corrections, they should not be discussed as significant results. Additionally, alternative plans for analysis strategies could be worth exploring, e.g., using ΔFC in addition to FC at a certain age. Lastly, some analysis plans lacked a strong theoretical foundation, such as the relationship between functional connectivity (FC) between certain ROIs and the development of cognitive flexibility.

      Thus, as much as I admire the advanced analysis of connectivity that was conducted and the uniqueness of longitudinal fNIRS data from these samples (even the sheer effort to collect fNIRS longitudinally in a low-income country at such a scale!), I have reservations about the importance of this paper's contribution to the field in its present form. Major revisions are needed, in my opinion, to enhance the paper's quality. 

      We acknowledge the reviewer’s concern regarding the reporting of results that do not survive multiple comparisons. However, considering the uniqueness of our dataset and the novelty of our work, we believe it is crucial to report all significant findings as well as hypothesis-generating findings that may not pass stringent significance thresholds. We have taken great care to transparently distinguish between results that survived multiple comparisons and those that did not in both the Results and Discussion sections, ensuring that readers are not misled. It is possible that future studies may replicate and further strengthen these associations. Therefore, by sharing these results with the research community, we provide valuable insights for future investigations.

      The relationship between FC and cognitive flexibility (as well as the relationship between growth and FC) has been explored focusing on those FC that showed a significant change with age, as specified in the results sections: ‘To investigate the impact of early nutritional status on FC at 24 months, we used multiple regression with the infant growth trajectory [...] and FC at 24 months [...]. To maximise power, we considered only those FC that showed a statistically significant change with age’ (line 183) and ‘To investigate whether FC early in life predicted cognitive flexibility at preschool age, we used multiple regression of FC across the first two years of life against later cognitive flexibility in preschoolers at three and five years. As per the analysis above, we focused on only those FC that showed a statistically significant change with age’ (line 198).

      We explored the possibility of investigating the relationship between changes in FC and changes in growth. However, the degrees of freedom in these analyses dropped dramatically (~25/30), thereby putting the significance and the meaning of the results at risk. We look forward to future longitudinal studies with less attrition across these time points to maintain the statistical power necessary to run such analyses.

      Reviewer #3 (Public Review):

      Summary:

      This study aimed to investigate whether the development of functional connectivity (FC) is modulated by early physical growth and whether these might impact cognitive development in childhood. This question was investigated by studying a large group of infants (N=204) assessed in Gambia with fNIRS at 5 visits between 5 and 24 months of age. Given the complexity of data acquisition at these ages and following data processing, data could be analyzed for 53 to 97 infants per age group. FC was analyzed considering 6 ensembles of brain regions and thus 21 types of connections. Results suggested that: i) compared to previously studied groups, this group of Gambian infants have different FC trajectory, in particular with a change in frontal inter-hemispheric FC with age from positive to null values; ii) early physical growth, measured through weight-for-length z-scores from birth on, is associated with FC at 24 months. Some relationships were further observed between FC during the first two years and cognitive flexibility at 4-5 years of age, but results did not survive corrections for multiple comparisons.

      Strengths:

      The question investigated in this article is important for understanding the role of early growth and undernutrition on brain and behavioral development in infants and children. The longitudinal approach considered is highly relevant to investigate neurodevelopmental trajectories. Furthermore, this study targets a little-studied population from a low-/middle-income country, which was made possible by the use of fNIRS outside the lab environment. The collected dataset is thus impressive and it opens up a wide range of analytical possibilities.

      We thank the reviewer for highlighting the strengths of this work.

      Weaknesses:

      - Analyzing such a huge amount of collected data at several ages is not an easy task to test developmental relationships between growth, FC, and behavioral capacities. In its present form, this study and the performed analyses lack clarity, unity and perhaps modeling, as it suggests that all possible associations were tested in an exploratory way without clear mechanistic hypotheses. Would it be possible to specify some hypotheses to reduce the number of tests performed? In particular, considering metrics at specific ages or changes in the metrics with age might allow us to test different hypotheses: the authors might clarify what they expect specifically for growth-FC-behaviour associations. Since some FC measures and changes might be related to one another, would it be reasonable to consider a dimensionality reduction approach (e.g., ICA) to select a few components for further correlation analyses?

      We confirm that this work was motivated by a compelling theoretical question: whether neural mechanisms, specifically FC, can be influenced by early adversity, such as growth, and subsequently impact cognitive outcomes, such as cognitive flexibility. This aligns with the overarching goal of the BRIGHT project, established in 2015 (Lloyd-Fox, 2023). We believe this was evident throughout the manuscript in several instances, for example:

      - “The goal of the study was to investigate early physical growth in infancy, developmental trajectories of brain FC across the first two years of life, and cognitive outcome at school age in a longitudinal cohort of infants and children from rural Gambia, an environment with high rates of maternal and child undernutrition. Specifically, we aimed to: (i) investigate whether differences in physical growth through the first two years of life are related to FC at 24 months, and (ii) investigate if trajectories of early FC have an impact on cognitive outcome at pre-school age in these children.” (page 4, introduction)

      - “This study investigated how early adversity via undernutrition drives longitudinal changes in brain functional connectivity at five time points throughout the first two years of life and how these developmental trajectories are associated with cognitive flexibility at preschool age.” (page 6, discussion)

      - We had a clear hypothesis regarding short-range connectivity decreasing with age and long-range connectivity increasing with age, as stated at the end of the introduction: We hypothesized that (i) long-range FC would increase and short-range FC would decrease throughout the first two years of life” (page 4, line 147). However, we were not able to formulate clear hypotheses about the localization of these connections due to the scarcity of previous studies conducted within this age range, particularly in low-resource settings. The ROI approach for analysis was chosen to mitigate this challenge by reducing the number of comparisons while still enabling us to estimate the developmental trajectories of all the connections from which we acquired data.

      Regarding the use of dimensionality reduction approach, we have not considered the use of ICA in our analysis. These methods require selecting a fixed number of components to remove from all participants. However, due to the high variability of infant fNIRS data across the five timepoints, we considered it untenable to precisely determine the number of components to remove at the group level. Such a procedure carries the risk of over-cleaning the data for some participants while leaving noise in for others (Di Lorenzo, 2019). We also felt that using PCA in this initial study would be beyond the scope of the brain-region-specific hypotheses and would be more appropriate in a follow-up analysis of these important data.

      References:

      Lloyd-Fox, S., McCann, S., Milosavljevic, B., Katus, L., Blasi, A., Bulgarelli, C., Crespo-Llado, M., Ghillia, G., Fadera, T., Mbye, E., Mason, L., Njai, F., Njie, O., Perapoch-Amado, M., Rozhko, M., Sosseh, F., Saidykhan, M., Touray, E., Moore, S. E., … Team, and the B. S. (2023). The Brain Imaging for Global Health (BRIGHT) Study: Cohort Study Protocol. Gates Open Research, 7(126).

      Di Lorenzo, R., Pirazzoli, L., Blasi, A., Bulgarelli, C., Hakuno, Y., Minagawa, Y., & Brigadoi, S. (2019). Recommendations for motion correction of infant fNIRS data applicable to multiple data sets and acquisition systems. NeuroImage, 200(April), 511–527.

      - It seems that neurodevelopmental trajectories over the whole period (5-24 months) are little investigated, and considering more robust statistical analyses would be an important aspect to strengthen the results. The discussion mentions the potential use of structural equation modelling analyses, which would be a relevant way to better describe such complex data.

      We appreciate the complexity of the dataset we are working with, which includes multiple measures and time points. Currently, our focus within the outputs from the BRIGHT project is on examining the relationship between selected measures. While this may not involve statistically advanced modelling at the moment, it is worth noting that most of the results presented in this work have survived correction for multiple comparisons, indicating their statistical robustness. We believe that more advanced statistical analyses are beyond the scope of this rich initial study. In the next phase of the project, known as BRIGHT IMPACT, our team is collaborating with statisticians and experts in statistical modelling to apply more sophisticated and advanced statistical techniques to the data.

      - Given the number of analyses performed, only describing results that survive correction for multiple comparisons is required. Unifying the correction approach (FDR / Bonferroni) is also recommended. For the association between cognitive flexibility and FC, results are not significant, and one might wonder why FC at specific ages was considered rather than the change in FC with age. One of the relevant questions of such a study would be whether early growth and later cognitive flexibility are related through FC development, but testing this would require a mediation analysis that was not performed.

      We acknowledge the reviewer’s concern regarding the reporting of results that do not survive multiple comparisons. However, considering the uniqueness of our dataset and the novelty of our work, we believe it is crucial to report all significant findings. We have taken great care to transparently distinguish between results that survived multiple comparisons and those that did not in both the Results and Discussion sections, ensuring that readers are not misled. It is possible that future studies may replicate and further strengthen these associations. Therefore, by sharing these results with the research community, we provide valuable insights for future investigations.

      We did not perform a mediation analysis as i) ΔWLZ between birth and the subsequent time points positively predicted frontal interhemispheric FC at 24 months, ii) frontal interhemispheric FC at 18 months (and right fronto-posterior connectivity at 24 months) predicted cognitive flexibility at preschool age. Considering that the frontal interhemispheric FC at 24 months that was positively predicted by growth, did not significantly predicted cognitive outcome at preschool age, we did not perform mediation models.

      The reviewer raised concerns about using different methods to correct for multiple comparisons throughout the work. Results showing changes in FC with age were Bonferroni corrected, while we used FDR correction for the regression analyses investigating the relationship between growth and FC, as well as FC and cognitive flexibility. Both methods have good control over Type I errors (false positives), but Bonferroni is very conservative, increasing the likelihood of Type II errors (false negatives). We considered Bonferroni an appropriate method for correcting results showing changes in FC with age, where we had a large sample with strong statistical power (i.e. linear mixed models with 132 participants who had at least 250 seconds of good data for 2 out of 5 visits). However, Bonferroni was too conservative for the regression analyses, with N between 57 and 78) (Acharya, 2014; Félix & Menezes, 2018; Narkevich et al., 2020; Narum, 2006; Olejnik et al., 1997).

      References:

      Acharya, A. (2014). A Complete Review of Controlling the FDR in a Multiple Comparison Problem Framework--The Benjamini-Hochberg Algorithm. ArXiv Preprint ArXiv:1406.7117.

      Félix, V. B., & Menezes, A. F. B. (2018). Comparisons of ten corrections methods for t-test in multiple comparisons via Monte Carlo study. Electronic Journal of Applied Statistical Analysis, 11(1), 74–91.

      Narkevich, A. N., Vinogradov, K. A., & Grjibovski, A. M. (2020). Multiple comparisons in biomedical research: the problem and its solutions. Ekologiya Cheloveka (Human Ecology), 27(10), 55–64.

      Narum, S. R. (2006). Beyond Bonferroni: less conservative analyses for conservation genetics. Conservation Genetics, 7, 783–787.

      Olejnik, S., Li, J., Supattathum, S., & Huberty, C. J. (1997). Multiple testing and statistical power with modified Bonferroni procedures. Journal of Educational and Behavioral Statistics, 22(4), 389–406.

      - Growth is measured at different ages through different metrics. Justifying the use of weight-for-length z-scores would be welcome since weight-for-age z-scores might be a better marker of growth and possible undernutrition (this impacting potentially both weight and length). Showing the distributions of these z-scores at different ages would allow the reader to estimate the growth variability across infants.

      We consistently used WLZ as the metric to measure growth throughout. Our analysis investigating the relationship between WLZ and growth included HCZ at 7/14 days to correct for head size at birth. When selecting the best growth measure for this paper, we opted for WLZ over WAZ, given extant evidence that infants in our sample are smaller and shorter compared to the reference WHO standard for the same age group (Nabwera et al., 2017). Therefore, using WLZ allows us to adjust each infant's weight for its own length.

      References:

      Nabwera, H. M., Fulford, A. J., Moore, S. E., & Prentice, A. M. (2017). Growth faltering in rural Gambian children after four decades of interventions: a retrospective cohort study. The Lancet Global Health, 5(2), e208–e216.

      - Regarding FC, clarifications about the long-range vs short-range connections would be welcome, as well as drawing a summary of what is expected in terms of FC "typical" trajectory, for the different brain regions and connections, as a marker of typical development. For instance, the authors suggest that an increase in long-range connectivity vs a decrease in short-range is expected based on previous fNIRS studies. However anatomical studies of white matter growth and maturation would suggest the reverse pattern (short-range connections developing mostly after birth, contrarily to long-range connections prenatally).

      We expected an increase in long-range functional connectivity with age, as discussed in the introduction:

      - “Based on data from fMRI, current models hypothesize that FC patterns mature throughout early development (23–27), where in typically developing brains, adult-like networks emerge over the first years of life as long-range functional connections between pre-frontal, parietal, temporal, and occipital regions become stronger and more selective (28–31). This maturation in FC has been shown to be related to the cascading maturation of myelination and synaptogenesis (32, 33) - fundamental processes for healthy brain development (34)” (line 93, page 3, introduction);

      - “Importantly, normative developmental patterns may be disrupted and even reversed in clinical conditions that impact development; e.g., increased short-range and reduced long-range FC have been observed in preterm infants (36) and in children with autism spectrum disorder (37, 38)” (line 103, page 3, introduction);

      - “We hypothesized that (i) long-range FC would increase and short-range FC would decrease throughout the first two years of life” (line 147, page 4, introduction).

      Since inferences about FC patterns recorded with fNIRS are highly limited by the number and locations of the optodes, it is challenging to make strong inferences about specific brain regions. Moreover, infant FC fNIRS studies are still limited, which is why we focused our inferences on long-range versus short-range connectivity, without specifically pinpointing particular brain regions.

      Additionally, were unable to locate the works mentioned by the reviewer regarding an increase in short-range white matter connectivity immediately after birth. On the contrary, we found several studies documenting an increase in white-matter long-range connectivity after birth, which is consistent with the hypothesised increase in FC long-range connectivity, such as:

      Yap, P. T., Fan, Y., Chen, Y., Gilmore, J. H., Lin, W., & Shen, D. (2011). Development trends of white matter connectivity in the first years of life. PloS one, 6(9), e24678.

      Dubois, J., Dehaene-Lambertz, G., Kulikova, S., Poupon, C., Hüppi, P. S., & Hertz-Pannier, L. (2014). The early development of brain white matter: a review of imaging studies in fetuses, newborns and infants. Neuroscience, 276, 48-71.

      Stephens, R. L., Langworthy, B. W., Short, S. J., Girault, J. B., Styner, M. A., & Gilmore, J. H. (2020). White matter development from birth to 6 years of age: a longitudinal study. Cerebral Cortex, 30(12), 6152-6168.

      Hagmann, P., Sporns, O., Madan, N., Cammoun, L., Pienaar, R., Wedeen, V. J., ... & Grant, P. E. (2010). White matter maturation reshapes structural connectivity in the late developing human brain. Proceedings of the National Academy of Sciences, 107(44), 19067-19072.

      Collin G, van den Heuvel MP. The ontogeny of the human connectome: development and dynamic changes of brain connectivity across the life span. Neuroscientist. 2013 Dec;19(6):616-28. doi: 10.1177/1073858413503712.

      The authors test associations between FC and growth, but making sense of such modulation results is difficult without a clearer view of developmental changes per se (e.g., what does an early negative FC mean? Is it an increase in FC when the value gets close to 0? In particular, at 24m, it seems that most FC values are not significantly different from 0, Figure 2B). Observing positive vs negative association effects depending on age is quite puzzling. It is also questionable, for some correlation analyses with cognitive flexibility, to focus on FC that changes with age but to consider FC at a given age.

      We thank the reviewer for bringing up this important point and understand that it requires some additional consideration. The negative FC values decreasing with age indicate that these regions go from being anti-correlated to becoming increasingly correlated. Hence, FC of these ROIs increased with age. The trajectory seems to suggest that this will keep increasing with age but of course further data need to be collected to assess this.

      Unfortunately, when considering ΔFC to predict cognitive flexibility, the numbers of participants dropped significantly, with N=~15/20 infants per group of preschoolers, making it very challenging to interpret the results with meaningful statistical power.

      - The manuscript uses inappropriate terms "to predict", "prediction" whereas the conducted analyses are not prediction analyses but correlational.

      We thank the reviewer for giving us to opportunity to thoroughly revise the manuscript about this matter. In this work, we had clear hypotheses regarding which variables predicted which certain measures (such as growth predicting FC and FC predicting cognitive outcomes). Therefore, we performed regression analyses rather than correlational analyses to investigate these associations. Hence, we believe that using the term ‘predict and ‘prediction’ is appropriate

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) In the introduction and discussion, the authors talk about the link between developmental trajectories and cognitive capacities, and undernutrition. However, they did not compare developmental trajectories but connectivity patterns at different ages with ΔWLZ and cognitive flexibility. I recommend that the authors rephrase the introduction and discussion.

      We thank the reviewer for pointing out places requiring better clarity in the text. We made edits through the introduction to better match our investigations. In particular we changed:

      - ‘our understanding of the relationships between early undernutrition, developmental trajectories of brain connectivity, and later cognitive outcomes is still very limited,’ to, ‘our understanding of the relationships between early undernutrition, brain connectivity, and later cognitive outcomes is still very limited’ (line 89, introduction);

      - ‘(ii) investigate if trajectories of early FC have an impact on cognitive outcome at pre-school age in these children,’ to, ‘(ii) investigate if early FC has an impact on cognitive outcome at pre-school age in these children’ (line 137, introduction);

      - ‘This study investigated how early adversity via undernutrition drives longitudinal changes in brain functional connectivity at five time points throughout the first two years of life and how these developmental trajectories are associated with cognitive flexibility at preschool age,’ to, ‘This study investigated how early adversity via undernutrition drives brain functional connectivity throughout the first two years of life and how these early functional connections are associated with cognitive flexibility at preschool age’ (line 215, discussion).

      (2) Considering most research is done in occidental high-income countries, and this work is one of the few presenting research in another context, I think the authors should discuss in the manuscript that differences with previous studies might also be due to environmental and cultural differences. Since the study lacks the statistical power to perform a statistical analysis that directly establishes a link between developmental trajectories and restricted growth and cognitive flexibility, the authors cannot disentangle which differences are related to undernutrition and which might result from growing up in a different environment. I recommend that the authors avoid phrases like (lines 57-58): "We observed that early physical growth before the fifth month of life drove optimal developmental trajectories of FC..." or (lines 223-224) "...our cohort of Gambian infants exhibit atypical developmental trajectories of functional connectivity...".

      We thank the reviewer for this observation, and we agree with the reviewer that other factors differing between low- and poor-resource settings might have an impact on FC trajectories. We therefore specified this in the discussion as follows: “We acknowledge that differences in FC could also be attributed to other environmental and cultural disparities between high-resource and low-resource settings, and future studies are needed to explore this further” (line 238). We revised the whole manuscript to reflect similar statements.

      (3) To better interpret the results, it would be interesting to know if poor early growth predicts late cognitive flexibility in the tested sample and if the ΔWLZ distributions differ compared to a population in a high-income country where undernutrition is less frequent.

      We explored the relationship between changes in growth and cognitive flexibility in the two preschooler group, but there were no significant associations.

      Mean and SD values of WLZ are reported in Table 3. The values at every age are negative, indicating that the infants' weight-for-length is below the expected norm at all ages. To our knowledge, no other studies have assessed changes in growth in an infant sample with similar closely spaced age time points in high-income countries, making comparisons on growth changes challenging.

      (4) It is unclear why WLZ at birth and HCZ at 7-14 days are included in the models. I imagine this is to ensure that differences are not due to growing restrictions before birth. It would be nice if the authors could explain this.

      As the reviewer pointed out, HCZ at 7-14 days was included to ensure associations between growth and FC are not due to physical differences at birth. This case be considered as a 'baseline' measure for cerebral development, in the same way that WLZ at birth was used as a baseline for physical development. Therefore, we can more confidently  assume that the associations between growth and FC were specific to the impact of change in WLZ postnatally and not confounded by the size or maturity of the infant at birth. We specified this in the manuscript as follows: “These analyses were adjusted by WLZ at birth and HCZ at 7/14 days, to more confidently assume that the associations between growth and FC were specific to the impact of change in WLZ postnatally and not confounded by the size or maturity of the infant at birth” (line 520, statistical analysis section in the method section).

      (5) Right frontal-posterior connections at 24 months negatively correlate with ΔWLZ. Thus, restricted growth results in stronger frontal-posterior connections at 24 months. However, the same connections at 24 months positively correlate with cognitive flexibility (stronger connections predict better cognitive flexibility). Do the authors have any interpretation of this? How could this relate to previous findings of the authors (Bulgarelli et al. 2020), showing first an increase and then a decrease in functional connectivity between frontal and parietal regions?

      We acknowledge that interpreting the negative relationship between changes in growth and fronto-posterior FC at 24 months, alongside the positive association between the same connection and later cognitive flexibility, is challenging. We refrain from relating these findings to those published by Bulgarelli in 2020 due to differences in optode locations and because in that work the decrease in fronto-posterior FC was observed after 24 months (up to 36 months), whereas the endpoint in this study is right at 24 months.

      (6) With the growth of the head, the frontal channels move to more temporal areas, right? Could this determine the decrease in frontal inter-hemisphere connections?

      As shown in Nabwera (2017) head size does not increase that much in Gambian infants, or at least as expected by the WHO standard measures. We have added HCZ mean and SD values per age in Table 3.

      Minor points

      - HCZ is used in line 184 but not defined.

      We thank the reviewer for spotting this, we have now specified HCZ at line 184 as follows: ‘head-circumference z-score (HCZ)’.

      - Table SI2: NIRS not undertaken = the participant was assessed but did want or could not perform... I imagine there is a missing "not".

      We thank the reviewer for spotting this, we have now modified the legend of Table SI2 as follows: ‘the participant was assessed but did not want or could not perform the NIRS assessments.’

      - The authors should explain what weight-for-length is for those who are not familiar with it.

      We have added an explanation of weight-for-length in the experimental design section, line 339 as follows: ‘We then tested for relationships between brain FC at age 24 months with measures of early growth, as indexed by changes in weight-for-length z-scores (reflecting body weight in proportion to attained growth in length) at one month of age, and at each of the four subsequent visits (details provided below).’

      Reviewer #2 (Recommendations For The Authors):

      (1) I am confused about the authors' interpretation that left and right front-middle and right front-back FC increased with age. It appears in Figure 2 that the negative FC among these ROIs should actually decrease with age. This means that as individuals grow older, the FC values between these regions and zero diminished, albeit starting with negative FC (anticorrelation values) in younger age groups.

      Yes, the reviewer is correct. The negative values of the left and right front-middle and right front-back FC decreasing with age indicate that these regions go from being anti-correlated to becoming increasingly correlated. Hence, FC of these ROIs increased with age.

      (2) Are these negative values mentioned above at 24 months still negative? Have t-tests been run to examine the differences from zero?

      As suggested, we performed t-tests against zero for the mentioned FC at 24 months, and only the left and right fronto-middle FC are significantly different than zero (left fronto-middle FC: t(94) = 1.8, p = 0.036; right fronto-middle FC t(94) = 2.7, p = 0.003).

      (3) With so many correlation analyses, have multiple comparisons been consistently controlled for? While I assume this was done according to the Methods section, could the authors clarify whether FDR adjustment was applied to all the p-values at once or to a group of p-values each time? I found the following way of reporting FDR-adjusted p-values quite informative, such as PFDR, 24 pairs of ROIs < 0.05.

      We thank the reviewer for this insightful comment. P-values of regression analyses were FDR corrected per connection investigated, i.e. 21 possible ΔWLZ values per connection. We have specified this in the method section as follows: “To ensure statistical reliability, results from the regression analyses on each FC were corrected for multiple comparisons using false discovery rate (FDR)(Benjamini & Hochberg, 1995) per each connection investigated, i.e. 21 possible ΔWLZ values per each connection,” (page 12, Statistical Analyses section).

      (4) Can early growth trajectories predict changes in FC? Why not use ΔWLZ to predict ΔFC?

      Unfortunately, when considering ΔWLZ to predict ΔFC, the numbers of participants dropped significantly, with N=~30 infants, making it very challenging to interpret the results. We believe this emphasizes the importance of recruiting large samples when conducting longitudinal studies involving infants and employing multiple measures.

      (5) I might have missed the rationale, but why weren't the growth changes after 5 months studied?

      ΔWLZ between all time points were assessed as predictors of FC at 24 months. We have specified this at line 183 as follows: ‘we used multiple regression with the infant growth trajectory (delta weight for length z-score between all time points, DWLZ) and FC at 24 months’. As indicated in Table 2 and 3 the associations between ΔWLZ at all time points and FC at 24 months were tested, but only those with DWLZ calculated between birth and 1 month and the subsequent time points were significant. DWLZ between 5 months and the subsequent time points, DWLZ between 8 months and the subsequent time points, DWLZ between 12 months and the subsequent time points, DWLZ between 18 months and the subsequent time points did not significantly predict FC at 24 months. These are highlighted in Table 2 and Figure 3 in blue and marked as NS (non-significant).

      (6) Once more, the advantage of longitudinal data is that it allows us to tap into developmental changes. Analyzing and predicting cognitive development based solely on FC values at a single age stage (i.e., 24 months) would overlook the benefits of a longitudinal design, which is regrettable. I suggest that the authors attempt to use ΔFC for predictions and observe the outcomes.

      As mentioned to point (4) raised by the reviewer, unfortunately, when considering ΔWLZ to predict ΔFC, the numbers of participants dropped significantly, with N=~30 infants, making it very challenging to interpret the results. We believe this emphasizes the importance of recruiting large samples when conducting longitudinal studies involving infants and employing various measures.

      (7) In the section "Early FC predicts cognitive flexibility at preschool age", the authors pointed out that "...,none of these survived FDR correction for multiple comparisons." However, the paper discussed the association between FC at 24 months of age and cognitive flexibility, as it was supported by the statistical analysis in the following sections. If FDR correction cannot be satisfied, I would rephrase the implication/conclusion of the results to suggest that early FC does not predict cognitive flexibility at preschool age.

      We acknowledge the reviewer’s concern regarding the reporting of results that do not survive multiple comparisons. However, considering the uniqueness of our dataset and the novelty of our work, we believe it is crucial to report all significant findings, even those not passing multiple comparisons corrections, as they may motivate hypothesis-generation for future studies. We have taken great care to transparently distinguish between results that survived multiple comparisons and those that did not in both the Results and Discussion sections, ensuring that readers are not misled. It is possible that future studies may replicate and further support these associations. Therefore, by sharing these results with the research community, we provide valuable insights for future investigations.

      Following the reviewer’ suggestion, we specified that results from regression analysis are significant but they did not survive multiple comparisons in the discussion as follows: ‘While our results are consistent with previous studies, we acknowledge that the significant association between early FC and later cognitive flexibility does not withstand multiple comparisons. Therefore, we encourage future studies that may replicate these findings with a larger sample. (line 290, discussion section).

      (8) Have the authors assessed the impact of growth trajectories on cognitive flexibility?

      We explored the relationship between changes in growth and cognitive flexibility in the two preschooler groups, but there were no significant associations.

      (9) Are there no other cognitive or behavioural measures available? Cognitive flexibility is just one domain of cognitive development, and would the impact of undernutrition on cognitive development be domain-specific? There is a lack of theoretical support here. Why choose cognitive flexibility, and should the impact of undernutrition be domain-specific or domain-general?

      We agree with the reviewer that in this work, we chose to focus on one specific cognitive outcome. While this does not imply that the impact of undernutrition is domain-specific, cognitive flexibility, being a core executive function, has been extensively studied in terms of its neural underpinnings using other neuroimaging modalities, especially fMRI (for example see Dajani, 2015; Uddin, 2021).

      Moreover, other studies looking at the effect of adversity on cognitive outcomes focus on specific cognitive skills, such as working memory (Roberts, 2017), reading and arithmetic skills (Soni, 2021).

      We did assess infants also with Mullen Scales of Early Learning (MSEL), although the cognitive flexibility task within the Early Years Toolbox has been specifically designed for preschoolers (Howard, 2015), and this set of tasks has recently been validated in our team in The Gambia (Milosavljevic, 2023).Future works from the BRIGHT team will investigate performance at the MSEL in relation to other variable of the project.

      References:

      D. R. Dajani, L. Q. Uddin, Demystifying cognitive flexibility: Implications for clinical and developmental neuroscience. Trends Neurosci. 38, 571–578 (2015).

      L. Q. Uddin, Cognitive and behavioural flexibility: neural mechanisms and clinical considerations. Nat. Rev. Neurosci. 22, 167–179 (2021).

      Roberts, S. B., Franceschini, M. A., Krauss, A., Lin, P. Y., de Sa, A. B., Có, R., ... & Muentener, P. (2017). A pilot randomized controlled trial of a new supplementary food designed to enhance cognitive performance during prevention and treatment of malnutrition in childhood. Current developments in nutrition, 1(11), e000885.

      Soni, A., Fahey, N., Bhutta, Z. A., Li, W., Frazier, J. A., Moore Simas, T., ... & Allison, J. J. (2021). Early childhood undernutrition, preadolescent physical growth, and cognitive achievement in India: A population-based cohort study. PLoS Medicine, 18(10), e1003838.

      Howard, S. J., & Melhuish, E. (2015). An Early Years Toolbox (EYT) for assessing early executive function, language, self-regulation, and social development: Validity, reliability, and preliminary norms. Journal of Psychoeducational Assessment, 35(3), 255-275.

      Milosavljevic, B., Cook, C. J., Fadera, T., Ghillia, G., Howard, S. J., Makaula, H., ... & Lloyd‐Fox, S. (2023). Executive functioning skills and their environmental predictors among pre‐school aged children in South Africa and The Gambia. Developmental Science, e13407.

      (10) I would review more previous fNIRS studies on infants if they exist (e.g., the work by S Lloyd-Fox, L Emberson, and others). These studies can help identify brain ROIs likely linked to undernutrition and cognitive flexibility. The current analysis methods lean towards exploratory research. This makes the paper more of a proof-of-concept report rather than a strongly theoretically-driven study.

      We thank the reviewer for this important point. While we have reviewed existing fNIRS infant studies, there are no extant works that showed whether specific brain regions are related undernutrition. However, several fMRI studies assessed regions that do support cognitive flexibility, and we mentioned these in the manuscript (for example see Dajani, 2015; Uddin, 2021).

      Other than the BRIGHT project, we are aware of two other projects that assessed the effect of undernutrition on brain development, assessing cognitive outcomes in poor-resource settings:

      - the BEAN project in Bangladesh in which fNIRS data were recorded from the bilateral temporal cortex (i.e. Pirazzoli, 2022);

      - a project in India in which fNIRS data were recorded from frontal, temporal and parietal cortex bilaterally (i.e. Delgado Reyes, 2020)

      The brain regions recorded in these studies largely overlap with the brain regions we recorded from in this study.

      Another aspect to consider is that infants underwent several fNIRS tasks as part of the BRIGHT project, focusing on social processing, deferred imitation, and habituation responses. Therefore, brain regions for data acquisition were chosen to maximize the likelihood of recording meaningful data for all tasks (Lloyd-Fox, 2023). To clarify the text, we specified this information in the methods section (line 383).

      References:

      D. R. Dajani, L. Q. Uddin, Demystifying cognitive flexibility: Implications for clinical and developmental neuroscience. Trends Neurosci. 38, 571–578 (2015).

      Pirazzoli, L., Sullivan, E., Xie, W., Richards, J. E., Bulgarelli, C., Lloyd-Fox, S., ... & Nelson III, C. A. (2022). Association of psychosocial adversity and social information processing in children raised in a low-resource setting: an fNIRS study. Developmental Cognitive Neuroscience, 56, 101125.

      Delgado Reyes, L., Wijeakumar, S., Magnotta, V. A., Forbes, S. H., & Spencer, J. P. (2020). The functional brain networks that underlie visual working memory in the first two years of life. NeuroImage, 219, Article 116971.

      Lloyd-Fox, S., McCann, S., Milosavljevic, B., Katus, L., Blasi, A., Bulgarelli, C., Crespo-Llado, M., Ghillia, G., Fadera, T., Mbye, E., Mason, L., Njai, F., Njie, O., Perapoch-Amado, M., Rozhko, M., Sosseh, F., Saidykhan, M., Touray, E., Moore, S. E., … Team, and the B. S. (2023). The Brain Imaging for Global Health (BRIGHT) Study: Cohort Study Protocol. Gates Open Research, 7(126).

      (11) Last but not least, in the paper, the authors mentioned that fNIRS offers better spatial resolution and anatomical specificity compared to EEG, thereby providing more precise and reliable localization of brain networks. While I partially agree with this perspective, it remains to be explored whether the current fNIRS analysis strategies indeed yield higher spatial resolution. It is hoped that the authors will delve deeper into this discussion in the paper.

      The brain regions of focus were selected based on coregistration work previously conducted at each time point on the array used in this project (Collins-Jones, 2019). We deliberately avoided making claims about small brain regions, considering that head size might increase slightly less with age in The Gambia compared to Western countries (Nabwera, 2017) . However, we maintain that the conclusions drawn in this study offer higher brain-region specificity than could have been  identified with current common EEG methods alone.

      References:

      L. H. Collins-Jones, et al., Longitudinal infant fNIRS channel-space analyses are robust to variability parameters at the group-level: An image reconstruction investigation. Neuroimage 237, 118068 (2021).

      Nabwera, H. M., Fulford, A. J., Moore, S. E., & Prentice, A. M. (2017). Growth faltering in rural Gambian children after four decades of interventions: a retrospective cohort study. The Lancet Global Health, 5(2), e208–e216.

      Reviewer #3 (Recommendations For The Authors):

      Introduction

      - Among important developmental mechanisms to mention are the development of exuberant connections and the further selection/stabilization of the relevant ones according to environmental stimulation, vs the pruning of others.

      We agree with the reviewer that the development of exuberant connections and subsequent pruning is a universal process of paramount importance during the first years of life. However, after revising our introduction, given the word limit of the journal, we maintained focus on neurodevelopment and early adversity.

      Results

      - Adding a few more information on the 6 sections and 21 connections would be welcome. In particular for within-section FC: how was this computed?

      The 6 sections were created based on the co-registration of the array used in this study at each age in a previous published work L. H. Collins-Jones, et al., Longitudinal infant fNIRS channel-space analyses are robust to variability parameters at the group-level: An image reconstruction investigation. Neuroimage 237, 118068 (2021). This is reference No. 68 in the manuscript.

      As we mentioned in the section fNIRS preprocessing and data-analysis: ‘The sections were established via the 17 channels of each hemisphere which were grouped into front, middle and back (for a total of six regions) based on a previous co-registration of the BRIGHT fNIRS arrays onto age-appropriate templates’.

      The 21 connections were defined as all the possible links between the 6 regions, specifically: the interhemispheric homotopic connections (in orange in Figure SI1), which connect the same regions between hemispheres (i.e., front left with front right); the intrahemispheric connections (in green in Figure SI1), which correlate channels belonging to the same region; the fronto-posterior connections (in blue in Figure SI1), which link front and middle, middle and back, and front and back regions of the same hemisphere; and the crossing interhemispheric connections (non-homotopic interhemispheric, in yellow in Figure SI1), which link the front, middle, and back areas between left and right hemispheres. We added these specifications also in the legend of Figure SI1 for clarity.

      - The denomination intrahemispheric vs fronto-posterior vs crossed connections is not clear. Maybe prefer intra-hemispheric vs inter-hemispheric homotopic vs inter-hemispheric non-homotopic (also in Figure SI1).

      We appreciate the reviewer's suggestion regarding terminology. However, we believe that the term 'inter-hemispheric non-homotopic' could potentially refer to both connections within the same brain hemisphere from front to back and connections crossing between hemispheres, leading to increased confusion. Therefore, we have chosen not to include the term 'non-homotopic' and instead added 'homotopic' to 'interhemispheric' throughout the manuscript to emphasize that these functional connections occur between corresponding regions of the two hemispheres.

      - with time -> with age.

      We replaced “with time” with “with age” as suggested through the manuscript.

      - The description of both HbO2 and HHb results overloads the main text: would it be relevant to present one of the two in Supplementary Information if the results are coherent?

      We understand the reviewer’s concern regarding overloading the results section with reporting both chromophores. However, reporting results for both HbO and HHb is considered a crucial step for publications in the fNIRS field, as emphasized in recent formal guidance (Yücel et al., 2020). One of the strengths of fNIRS compared to fMRI is its ability to record from both chromophores, enabling a more precise characterization of brain activations and oscillations. Moreover, in FC studies like this one, ensuring that HbO and HHb results overlap is an important check that increases confidence in interpreting the findings.

      References:

      Yücel, M. A., von Lühmann, A., Scholkmann, F., Gervain, J., Dan, I., Ayaz, H., Boas, D., Cooper, R. J., Culver, J., Elwell, C. E., Eggebrecht, A. ., Franceschini, M. A., Grova, C., Homae, F., Lesage, F., Obrig, H., Tachtsidis, I., Tak, S., Tong, Y., … Wolf, M. (2020). Best Practices for fNIRS publications. Neurophotonics, 1–34. https://doi.org/10.1117/1.NPh.8.1.012101

      - HCZ is not defined when first used.

      We thank the reviewer for spotting this, we have now specified HCZ at line 184 as follows: ‘head-circumference z-score (HCZ)’.

      - Choosing the analyzed measures to "maximize power" could be criticised.

      We appreciate the reviewer’s concern. However, correlating all the FC values with all changes in growth would have raised an important issue for multiple comparisons. We therefore we made a priori decision to focus on investigating the relationship between changes in growth and those FC that showed a significant change with age, considering these as the most interesting ones from a developmental perspective in our sample.

      Discussion

      - I would recommend using the same order to synthesize results and further discuss them.

      We agree with the reviewer that the suggested structure is optimal for a clear discussion section. We have indeed followed it, with each paragraph covering specific aspects:

      - Recap of the study aims

      - Results summary and discussion of developmental changes

      - Results summary and discussion of the relationship between changes in growth and FC

      - Results summary and discussion of the relationship between FC and cognitive flexibility

      - Limitations

      - Conclusion

      Given the numerous results presented in this paper, we believe that readers will better digest them by first reading a summary of the results followed by their interpretations, rather than condensing all the interpretations together.

      - Highlighting how "atypical" developmental trajectories are in Gambian infants would be welcome in the Results section. Other interpretations can be found than "The observed decrease in frontal inter-hemispheric FC with increasing age may be due to the exposure to early life undernutrition adversity".

      We agree with the reviewer that other factors that differ between low- and high-resource settings might have an impact on FC trajectories. We therefore specified this in the discussion as follows: “We acknowledge that differences in FC could also be attributed to other environmental and cultural disparities between high-resource and low-resource settings, and future studies are needed to further investigate cultural, environmental, and genetic effects on brain FC” (line 238).

      - Focusing on FC at 24m for the relationship with growth is questionable.

      Correlating the FC values at 5 time points with all changes in growth would have raised an important issue for multiple comparisons. We therefore we made a decision a priori to focus on investigating the relationship between changes in growth and FC at 24 months as our final time point of data collection. We added this information in the methods section as follows: “To investigate the impact of undernutrition on FC development, we used DWLZ as independent variables in regression analyses on HbO2 (as the chromophore with the highest signal-to-noise ratio) FC at 24 months, our final time point of data collection” (line 517, method section).

      - There is too much emphasis on the correlation between FC and cognitive flexibility, whereas results are not significant after correction for multiple comparisons.

      Following the reviewer’ suggestion, we specified that results from regression analysis are significant but they did not survive multiple comparisons in the discussion as follows: While our results are consistent with previous studies, we acknowledge that the significant association between early FC and later cognitive flexibility does not withstand multiple comparisons. Therefore, we encourage future studies that may replicate these findings with a larger sample. (line 290, discussion section).

      Methods

      - I would recommend detailing how z-scores were computed in the paragraph "Anthropometric measures".

      We specified how z-scores were computed in the statistical analysis section as follows: “Anthropometric measures were converted to age and sex adjusted z‐scores that are based on World Health Organization Child Growth Standards (93). Weight‐for‐Length (WLZ) and Head Circumference (HCZ) z-scores were computed” (line 509, method section). As transforming data is the first step of statistical analysis and is not directly related to data collection, we believe it is more appropriate to retain this description in the statistical analysis section.

      - FC computation: the mention of "correlating the first and the last 250s" is not clear.

      We specified this more clearly in the text as follows: We found that correlating the first and the last 250 seconds of valid data after pre-processing provided the highest percentage of infants with strong correlation between the first and the last portion of data (line 467).

      - The manuscript mentions "age 3 years" for the younger preschoolers but ~48months rather corresponds to 4 years.

      We revised the entire manuscript and the supplementary materials, but we could not find any instance in which preschoolers are referred with age in months rather than in years.

      - Specify the number of children evaluated at 4 and 5 years. Is the test of cognitive flexibility normalized for age? If not, how were the 2 groups considered in the analyses? (age as a confounding factor).

      We have added the number of children in the two preschooler groups as follows: younger preschoolers (age mean ± SD=47.96 ± 2.77 months, N=77) and older preschoolers (age mean ± SD=57.58 ± 2.11 months, N=84). (line 484).

      The cognitive flexibility test was not normalized for age, as this task was specifically developed for preschoolers (Howard, 2015). As mentioned in ‘Cognitive flexibility at preschool age’ of the methods section, “data were collected in two ranges of preschool ages”, which guided our decision to perform regression analysis on the impact of FC on cognitive flexibility separately within these two age groups, rather than treating them as a single group of preschoolers.

      References:

      Howard, S. J., & Melhuish, E. (2015). An Early Years Toolbox (EYT) for assessing early executive function, language, self-regulation, and social development: Validity, reliability, and preliminary norms. Journal of Psychoeducational Assessment, 35(3), 255-275.

      Figures and Tables

      - Table 1 could highlight the significant results. It is not clear what the "baseline" results correspond to.

      We have marked in bold the results that are statistically significant in Table 1. In the linear mixed model we performed, the first time point (i.e. 5 months) is chosen as ‘baseline’, i.e. the reference against which the other time points are compared to, and its statistical values refer to its significance against 0 (as it has been performed in Bulgarelli 2020).

      - Figures 2 B and C seem redundant? What is SE vs SD?

      We believe that both figures 2B and 2C are useful for the readers. While the first one shows the mean FC values at the group level, the second one highlights the individual variability of FC values (typical of infant neuroimaging data), which also why it is interesting to relate these measures to other variables of our dataset (i.e. growth and cognitive flexibility). Figure 2C also reports mean FC values per age, but these might be less visible considering that also one dot per infant is also plotted.

      SE stands for standard error, and in the legend of the figure we specified this as follows: ‘Mean and standard error of the mean (SE)’. SD stands for standard deviation, and we have now specified this as follows: ‘mean ± standard deviation (SD)’ .

      - Table 2: I would recommend removing results that don't survive corrections for multiple comparisons.

      We acknowledge the reviewer’s concern regarding the reporting of results that do not survive multiple comparisons. However, considering the uniqueness of our dataset and the novelty of our work, we believe it is crucial to report all significant findings. We have taken great care to transparently distinguish between results that survived multiple comparisons and those that did not in both the Results and Discussion sections, ensuring that readers are not misled. It is possible that future studies may replicate and further strengthen these associations. Therefore, by sharing these results with the research community, we provide valuable insights for future investigations.

      - Figure 3: the top is redundant with Table 2: to be merged? B: the statistical results might be shown in a Table.

      We agree with the reviewer that the top part of Figure 3 and Table 2 report the same results. However, given the richness of these findings, we believe that the top part of Figure 3 serves as a useful summary for readers. Additionally, examining both the top and bottom parts of Figure 3 provides a comprehensive overview of the regression analysis conducted in this study.

      - Figure SI6: Is it really a % in x-axis?

      We thank the reviewer for spotting this typo, the percentage is relevant for the y-axis only. We removed the % symbol from ticks of the x-axis.

      - Table SI1: the presented p-values don't seem to survive Bonferroni correction, contrary to what is written.

      We thank the reviewer for spotting this mistake, we removed the reference to the Bonferroni correction for the p-values.

      - Table SI2: For the proportion of children included in the analysis, maybe be precise that the proportion was computed based on the ones with acquired data. Maybe also add the proportion according to all children, to better show the high drop-out rate at certain ages?

      We thank the reviewer for these useful suggestions. We have specified in the legend of the table how we calculated the proportion of infants included as follows: ‘The proportion of children included in the analysis was computed based on the infants with FC data’. We have also added a column in the table called ‘Inclusion rate (from the 204 infants recruited)’, following the reviewer’s suggestion. This will be a useful reference for future studies.

      - A few typos should be corrected throughout the manuscript.

      We thoroughly revised the main manuscript and the supplementary materials for typos.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      Building on previous in vitro synaptic circuit work (Yamawaki et al., eLife 10, 2021), Piña Novo et al. utilize an in vivo optogenetic-electrophysiological approach to characterize sensory-evoked spiking activity in the mouse's forelimb primary somatosensory (S1) and motor (M1) areas. Using a combination of a novel "phototactile" somatosensory stimuli to the mouse's hand and simultaneous high-density linear array recordings in both S1 and M1, the authors report in awake mice that evoked cortical responses follow a triphasic peak-suppression-rebound pattern response. They also find that M1 responses are delayed and attenuated relative to S1. Further analysis revealed a 20-fold difference in subcortical versus corticocortical propagation speeds.

      They also report that PV interneurons in S1 are strongly recruited by hand stimulation. Furthermore, they report that selective activation of PV cells can produce a suppression and rebound response similar to "phototactile" stimuli. Lastly, the authors demonstrate that silencing S1 through local PV cell activation reduces M1 response to hand stimulation, suggesting S1 may directly drive M1 responses.

      Strengths:

      The study was technically well done, with convincing results. The data presented are appropriately analyzed. The author's findings build on a growing body of both in vitro and in vivo work examining the synaptic circuits underlying the interactions between S1 and M1. The paper is well-written and illustrated. Overall, the study will be useful to those interested in forelimb S1-M1 interactions.

      Weaknesses:

      Although the results are clear and convincing, one weakness is that many results are consistent with previous studies in other sensorimotor systems, and thus not all that surprising. For example, the findings that sensory stimulation results in delayed and attenuated responses in M1 relative to S1 and that PV inhibitory cells in S1 are strongly recruited by sensory stimulation are not novel (e.g., Bruno et al., J Neurosci 22, 10966-10975, 2002; Swadlow, Philos Trans R Soc Lond B Biol Sci 357, 1717-1727, 2002; Gabernet et al., Neuron 48, 315-327, 2005; Cruikshank et al., Nat Neurosci 10, 462-468, 2007; Ferezou et al., Neuron 56, 907-923, 2007; Sreenivasan et al., Neuron 92, 1368-1382, 2016; Yu et al., Neuron 104, 412-427 e414, 2019). Furthermore, the observation that sensory processing in M1 depends upon activity in S1 is also not novel (e.g., Ferezou et al., Neuron 56, 907-923, 2007; Sreenivasan et al., Neuron 92, 1368-1382, 2016). The authors do a good job highlighting how their results are consistent with these previous studies.

      We thank the reviewer for the close reading of the manuscript and the many constructive comments and critiques. As the reviewer notes, there have been many prior studies of related circuits in other sensorimotor systems forming an important context for our study and findings, as we have tried to highlight. We appreciate the suggestions for additional relevant articles to cite.

      Perhaps a more significant weakness, in my opinion, was the missing analyses given the rich dataset collected. For example, why lump all responsive units and not break them down based on their depth? Given superficial and deep layers respond at different latencies and have different response magnitudes and durations to sensory stimuli (e.g., L2/3 is much more sparse) (e.g., Constantinople et al., Science 340, 1591-1594, 2013; Manita et al., Neuron 86, 1304-1316, 2015; Petersen, Nat Rev Neurosci 20, 533-546, 2019; Yu et al., Neuron 104, 412-427 e414, 2019), their conclusions could be biased toward more active layers (e.g., L4 and L5). These additional analyses could reveal interesting similarities or important differences, increasing the manuscript's impact. Given the authors use high-density linear arrays, they should have this data.

      We have analyzed the activity patterns as a function of cortical depth, and now include these results in the manuscript as suggested. The key new finding is that the M1 responses are strongest in upper layers, consistent with expectations based on the excitatory corticocortical synaptic connectivity characterized previously. Changes to the manuscript include new figures (Figure 5; Figure 5 - figure supplement 1), which we explain (Methods: page 14, lines 618-621), describe (new Results section: pages 4-5, lines 183-189), comment on (Discussion: page 9, lines 378-391), and summarize the significance of (Abstract: page 1, lines 22-24). In addition, we incorporated the new laminar analysis into a summary schematic (Figure 9). We thank the reviewer for suggesting this analysis.

      Similarly, why not isolate and compare PV versus non-PV units in M1? They did the photostimulation experiments and presumably have the data. Recent in vitro work suggests PV neurons in the upper layers (L2/3) of M1 are strongly recruited by S1 (e.g., Okoro et al., J Neurosci 42, 8095-8112, 2022; Martinetti et al., Cerebral cortex 32, 1932-1949, 2022). Does the author's data support these in vitro observations?

      These experiments were relatively complex and M1 optotagging was not routinely included in the stimulus and acquisition protocol. Therefore, we don’t have sufficient data for this analysis. We plan to address this in future studies.

      It would have also been interesting to suppress M1 while stimulating the hand to determine if any part of the S1 triphasic response depends on M1 feedback.

      We agree that this is of interest but consider this to be outside the scope of the current study.

      I appreciate the control experiment showing that optical hand stimulation did not evoke forelimb movement. However, this appears to be an N=1. How consistent was this result across animals, and how was this monitored in those animals? Can the authors say anything about digit movement?

      We have performed additional experiments to address this point. A constraint with EMG is that it is limited to the muscle(s) one chooses to record from, and it is difficult to implant tiny muscles of the hand. Therefore, for this analysis, we used kilohertz videography as a high-sensitivity method for movement surveillance across the hand. Hand stimulation did not evoke any detectable movements. Changes in the manuscript include: revised Figure 1 - figure supplement 1; supplementary Figure 1 - video 1; and associated text edits in the Methods (page 13, line 557; page 14, lines 626-639) and Results sections (page 2, lines 84-85).

      A light intensity of 5 mW was used to stimulate the hand, but it is unclear how or why the authors chose this intensity. Did S1 and M1 responses (e.g., amplitude and latency) change with lower or higher intensities? Was the triphasic response dependent on the intensity of the "phototactile" stimuli?

      As we now say in the Methods > Optogenetic photostimulation of the hand section (page 13, lines 562-565), “This intensity was chosen based on pilot experiments in which we varied the LED power, which showed that this intensity was reliably above the threshold for evoking robust responses in both S1 and M1 without evoking any visually detectable movements (as subsequently confirmed by videography)”.

      Reviewer #2 (Public review):

      Summary:

      Communication between sensory and motor cortices is likely to be important for many aspects of behavior, and in this study, the authors carefully analyse neuronal spiking activity in S1 and M1 evoked by peripheral paw stimulation finding clear evidence for sensory responses in both cortical regions

      Strengths:

      The experiments and data analyses appear to have been carefully carried out and clearly represented.

      Weaknesses:

      (1) Some studies have found evidence for excitatory projection neurons expressing PV and in particular some excitatory pyramidal cells can be labelled in PV-Cre mice. The authors might want to check if this is the case in their study, and if so, whether that might impact any conclusions.

      Thank you for pointing this out. The prior studies suggest it is mainly a subset of layer 5B excitatory neurons that may express PV. We checked this in two ways. Anatomically, we did not find double-labeling. An electrophysiology assay showed that, although some evoked excitatory synaptic input could be detected in some neurons, these inputs were very weak. Results from these assays are shown in new Figure 6 - figure supplement 1, with associated text edits in the Methods (page 11, lines 469-471; page 15, lines 657-668) and Results (page 5, lines 198-199) sections.

      (2) I think the analysis shown in Figure S1 apparently reporting the absence of movements evoked by the forepaw stimulation could be strengthened. It is unclear what is shown in the various panels. I would imagine that an average of many stimulus repetitions would be needed to indicate whether there is an evoked movement or not. This could also be state-dependent and perhaps more likely to happen early in a recording session. Videography could also be helpful.

      As noted above, we have performed additional experiments to address this.

      (3) Some similar aspects of the evoked responses, including triphasic dynamics, have been reported in whisker S1 and M1, and the authors might want to cite Sreenivasan et al., 2016.

      Thank you for pointing this out; we now cite this article (page 1, line 46; page 10, line 415).

      Reviewer #3 (Public review):

      Summary:

      This is a solid study of stimulus-evoked neural activity dynamics in the feedforward pathway from mouse hand/forelimb mechanoreceptor afferents to S1 and M1 cortex. The conclusions are generally well supported, and match expectations from previous studies of hand/forelimb circuits by this same group (Yamawaki et al., 2021), from the well-studied whisker tactile pathway to whisker S1 and M1, and from the corresponding pathway in primates. The study uses the novel approach of optogenetic stimulation of PV afferents in the periphery, which provides an impulselike volley of peripheral spikes, which is useful for studying feedforward circuit dynamics. These are primarily proprioceptors, so results could differ for specific mechanoreceptor populations, but this is a reasonable tool to probe basic circuit activation. Mice are awake but not engaged in a somatosensory task, which is sufficient for the study goals.

      The main results are:

      (1) brief peripheral activation drives brief sensory-evoked responses at ~ 15 ms latency in S1 and ~25 ms latency in M1, which is consistent with classical fast propagation on the subcortical pathway to S1, followed by slow propagation on the polysynaptic, non-myelinated pathway from S1 to M1;

      (2) each peripheral impulse evokes a triphasic activation-suppression-rebound response in both S1 and M1;

      (3) PV interneurons carry the major component of spike modulation for each of these phases; (4) activation of PV neurons in each area (M1 or S1) drives suppression and rebound both in the local area and in the other downstream area;

      (5) peripheral-evoked neural activity in M1 is at least partially dependent on transmission through S1.

      All conclusions are well-supported and reasonably interpreted. There are no major new findings that were not expected from standard models of somatosensory pathways or from prior work in the whisker system.

      Strengths:

      This is a well-conducted and analyzed study in which the findings are clearly presented. This will provide important baseline knowledge from which studies of more complex sensorimotor processing can build.

      Weaknesses:

      A few minor issues should be addressed to improve clarity of presentation and interpretation:

      (1) It is critical for interpretation that the stimulus does not evoke a motor response, which could induce reafference-based activity that could drive, or mask, some of the triphasic response. Figure S1 shows that no motor response is evoked for one example session, but this would be stronger if results were analyzed over several mice.

      As noted above, we have performed additional experiments to address this point.

      (2) The recordings combine single and multi-units, which is fine for measures of response modulation, but not for absolute evoked firing rate, which is only interpretable for single units. For example, evoked firing rate in S1 could be higher than M1, if spike sorting were more difficult in S1, resulting in a higher fraction of multi-units relative to M1. Because of this, if reporting of absolute firing rates is an essential component of the paper, Figs 3D and 4E should be recalculated just for single units.

      Thank you for noting this. Although the absolute firing rates are not essential for the main findings or conclusions (which as noted focus on response modulations and relative differences) we agree that analyzing the single-unit response amplitudes is useful. Therefore, changes in the manuscript now include: revised Figure 3, and associated text edits in the Methods (page 12, lines 543-545), Results (page 3, lines 115-119), and Discussion (page 7, lines 305-311) sections.

      (3) In Figure 5B, the average light-evoked firing rate of PV neurons seems to come up before time 0, unlike the single-trial rasters above it. Presumably, this reflects binning for firing rate calculation. This should be corrected to avoid confusion.

      Yes, this reflects the binning. We agree that this is potentially confusing and have removed these average plots below the raster plots, as the rasters alone suffice to demonstrate the result (i.e., that PV units are strongly activated and thus tagged by optogenetic stimulation). Changes are now reflected in revised Figure 6.

      (4) In Figure 6A bottom, please clarify what legends "W. suppression" and "W. rebound" mean.

      In the figure plot legends, the “W.” has been removed. Changes are now reflected in revised Figure 7 and Figure 7 – figure supplement 1.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Did you filter the neural signals during acquisition? If so, please include these details in the results.

      Signals were bandpass-filtered (2.5 Hz to 7.6 KHz) at the hardware level at acquisition (with no additional software filtering applied), as now clarified in the Methods Electrophysiological recordings section as requested (page 12, lines: 525-526).

      Reviewer #2 (Recommendations for the authors):

      (1) Some studies have found evidence for excitatory projection neurons expressing PV and in particular some excitatory pyramidal cells can be labelled in PV-Cre mice. The authors might want to check if this is the case in their study, and if so, whether that might impact any conclusions.

      Please see above for our response to this issue.

      (2) I think the analysis shown in Figure S1 apparently reporting the absence of movements evoked by the forepaw stimulation could be strengthened. It is unclear what is shown in the various panels. I would imagine that an average of many stimulus repetitions would be needed to indicate whether there is an evoked movement or not. This could also be state-dependent and perhaps more likely to happen early in a recording session. Videography could also be helpful.

      Please see above for our response to this issue.

      (3) Some similar aspects of the evoked responses, including triphasic dynamics, have been reported in whisker S1 and M1, and the authors might want to cite Sreenivasan et al., 2016.

      As noted above, we now cite this study.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors discovered MYL3 of marine medaka (Oryzias melastigma) as a novel NNV entry receptor, elucidating its facilitation of RGNNV entry into host cells through macropinocytosis, mediated by the IGF1R-Rac1/Cdc42 pathway.

      Strengths:

      In this manuscript, the authors have performed in vitro and in vivo experiments to prove that MnMYL3 may serve as a receptor for NNV via macropinocytosis pathway. These experiments with different methods include Co-IP, RNAi, pulldown, SPR, flow cytometry, immunofluorescence assays, and so on. In general, the results are clearly presented in the manuscript.

      Weaknesses:

      For the writing in the introduction and discussion sections, the author Yao et al mainly focus on the viral pathogens and fish in Aquaculture, the meaning and novelty of results provided in this manuscript are limited, and not broad in biology. The authors should improve the likely impact of their work on the viral infection field, maybe also in the evolutionary field with the fish model.

      (1) Myosin is a big family, why did authors choose MYL3 as a candidate receptor for NNV?

      We appreciate your insightful question. We selected MYL3 as a candidate receptor based on a combination of proteomic screening and literature evidence, and functional validation. Increasing evidence indicated that myosins have been implicated in viral infections. For instance, myosin heavy chain 9 plays a role in multiple viral infections (Li et al., 2018), and non-muscle myosin heavy chain IIA has been identified as an entry receptor for herpes simplex virus-1 (Arii et al., 2010). Furthermore, myosin II light chain activation is essential for influenza A virus entry via macropinocytosis (Banerjee et al., 2014). Our previous studies hinted at a potential interaction between MYL3 and CP (Zhang et al., 2020). Huang et al also reported that Epinephelus coioides MYL3 might interact with native NNV CP by proteomic analysis of immunoprecipitation (IP) assay (Huang et al., 2020). Our Co-IP and SPR analyses confirmed a direct interaction between MYL3 and the RGNNV CP. Based on these studies, we selected MYL3 as a candidate receptor for NNV.

      References

      Huang PY, Hsiao HC, Wang SW, Lo SF, Lu MW, Chen LL. 2020. Screening for the Proteins That Can Interact with Grouper Nervous Necrosis Virus Capsid Protein. Viruses 12:1–20.

      Li L, Xue B, Sun W, Gu G, Hou G, Zhang L, Wu C, Zhao Q, Zhang Y, Zhang G, Hiscox JA, Nan Y, Zhou EM. 2018. Recombinant MYH9 protein C-terminal domain blocks porcine reproductive and respiratory syndrome virus internalization by direct interaction with viral glycoprotein 5. Antiviral Res 156:10–20.

      Arii J, Goto H, Suenaga T, Oyama M, Kozuka-Hata H, Imai T, Minowa A, Akashi H, Arase H, Kawaoka Y, Kawaguchi Y. 2010. Non-muscle myosin IIA is a functional entry receptor for herpes simplex virus-1.

      Banerjee I, Miyake Y, Philip Nobs S, Schneider C, Horvath P, Kopf M, Matthias P, Helenius A, Yamauchi Y. 2014. Influenza A virus uses the aggresome processing machinery for host cell entry. Science (80- ) 346:473–477.

      (2) What is the relationship between MmMYL3 and MmHSP90ab1 and other known NNV receptors? Why does NNV have so many receptors? Which one is supposed to serve as the key entry receptor?

      We acknowledge the functional diversity of receptors for NNV. MmHSP90ab1 and MmHSC70 have been identified as receptors involved in NNV entry through clathrin-mediated endocytosis (CME), whereas MYL3 facilitates entry via macropinocytosis. These pathways serve as complementary mechanisms for the virus to enter host cells, potentially enhancing infection efficiency. While HSP90ab1 facilitates CME, MYL3 promotes macropinocytosis, both of which are critical for viral internalization, but through distinct endocytic mechanisms.

      NNV likely utilizes multiple receptors to increase its host range and infection efficiency. The diversity of receptors ensures that the virus can infect a wide variety of host species. By employing HSP90ab1, HSC70, and MYL3, NNV can exploit different cellular pathways for entry, making it more adaptable to various host environments.

      Regarding the identification of a key entry receptor, we agree this is a critical unresolved question. While HSP90ab1/HSC70 appear essential for CME-mediated entry, our data suggest MYL3 plays a distinct role in macropinocytic uptake. To systematically evaluate receptor hierarchy, we initially proposed comparative knockout studies targeting these candidate genes. However, we must acknowledge that current technical limitations in marine fish models – particularly the extended generation time for stable knockout cell lines and challenges in maintaining viable cell cultures post-editing – have delayed these experiments. Nevertheless, we are actively exploring strategies to overcome these obstacles and will continue to refine our approach to address these questions in future research.

      (3) In vivo knockout of MYL3 using CRISPR-Cas9 should be conducted to verify whether the absence of MYL3 really inhibits NNV infection. Although it might be difficult to do it in marine medaka as stated by the authors, the introduction of zebrafish is highly recommended, since it has already been reported that zebrafish could serve as a vertebrate model to study NNV (doi: 10.3389/fimmu.2022.863096).

      As noted in our manuscript from line 374 to 384, marine medaka is a relatively new model for studying viral infections and is not yet optimized for CRISPR-Cas9-mediated gene knockout. The technical challenges related to precise embryo microinjection and off-target effects using CRISPR-Cas9 in marine medaka complicate the establishment of knockout lines. These limitations, including the time required for multiple breeding generations and molecular screening, currently make this approach difficult to implement.

      We fully agree with your suggestion to consider zebrafish as an alternative model. Zebrafish have been well-established as a vertebrate model for studying NNV, and their genetic tractability and well-developed CRISPR-Cas9 protocols provide a more accessible and efficient platform for generating knockout models. In our future studies, we plan to conduct CRISPR-Cas9-mediated knockout experiments targeting multiple NNV receptors in zebrafish. This will allow us to systematically evaluate the role of different receptors in NNV infection and elucidate their potential interactions. The findings from these studies will be included in a future publication, which will provide a more comprehensive understanding of the molecular mechanisms underlying NNV infection in vertebrate models.

      (4) The results shown in Figure 6 are not enough to support the conclusion that "RGNNV triggers macropinocytosis mediated by MmMYL3". Additional electron microscopy of macropinosomes (sizes, morphological characteristics, etc.) will be more direct evidence.

      Previous study has reported that dragon grouper nervous necrosis virus (DGNNV) enters SSN-1 cells primarily through micropinocytosis and macropinocytosis pathways. Electron microscopy observations revealed several kinds of membrane ruffling and large disproportionate macropinosomes were observed in DGNNV infected cells, indicating NNV infection could triggers micropinocytosis (Liu et al., 2005). In our study, the data from inhibitor treatments, co-localization of MmMYL3 with RGNNV CP, and dextran uptake assays also provide compelling evidence for the involvement of macropinocytosis in RGNNV entry via MmMYL3. These methods are well-established in the literature and have been used extensively to study viral entry pathways (Lingemann et al., 2019). Specifically, the dextran uptake assay has been widely utilized as a marker for macropinocytosis and has provided clear evidence of RGNNV internalization via this pathway. The use of macropinocytosis inhibitors, such as EIPA and Rottlerin, significantly reduced RGNNV entry, further supporting our conclusion. Nonetheless, we acknowledge the potential value of additional electron microscopy studies and will consider this approach in our future research.

      References

      Liu W, Hsu CH, Hong YR, Wu SC, Wang CH, Wu YM, Chao CB, Lin CS. 2005. Early endocytosis pathways in SSN-1 cells infected by dragon grouper nervous necrosis virus, J Gen Virol.

      Lingemann M, McCarty T, Liu X, Buchholz UJ, Surman S, Martin SE, Collins PL, Munir S. 2019. The alpha-1 subunit of the Na+,K+-ATPase (ATP1A1) is required for macropinocytic entry of respiratory syncytial virus (RSV) in human respiratory epithelial cells, PLoS Pathogens.

      (5) MYL3 is "predominantly found in muscle tissues, particularly the heart and skeletal muscles". However, NNV is a virus that mainly causes necrosis of nervous tissues (brain and retina). If MYL3 really acts as a receptor for NNV, how does it balance this difference so that nervous tissues, rather than muscle tissues, have the highest viral titers?

      While MYL3 is highly expressed in cardiac and skeletal muscles, studies have shown that MYL3, like other myosin light chains, can also be present in non-muscle tissues. Additionally, proteins involved in viral entry do not always need to be the most highly expressed in the final target tissue, as long as they facilitate the initial infection process. For instance, rabies virus is a rhabdovirus which exhibits a marked neuronotropism in infected animals. Transferrin receptor protein 1 can serve as a receptor for rabies virus through CME pathway, but TfR1 expressed most abundantly in liver tissue not nervous system (Wang et al., 2023).

      Viral tropism is often determined not only by the presence of an entry receptor but also by co-receptors, cellular factors, and post-entry mechanisms. While MYL3 may act as a receptor for NNV, other factors, such as cell-specific proteases, signaling molecules, and intracellular trafficking pathways, likely contribute to NNV’s preferential replication in the brain and retina.

      Reference

      Wang Xinxin, Wen Z, Cao H, Luo J, Shuai L, Wang C, Ge J, Wang Xijun, Bu Z, Wang J. 2023. Transferrin Receptor Protein 1 Is an Entry Factor for Rabies Virus. J Virol 97. doi:10.1128/jvi.01612-22

      Reviewer #2 (Public review):

      Summary:

      The manuscript offers an important contribution to the field of virology, especially concerning NNV entry mechanisms. The major strength of the study lies in the identification of MmMYL3 as a functional receptor for RGNNV and its role in macropinocytosis, mediated by the IGF1R-Rac1/Cdc42 signaling axis. This represents a significant advance in understanding NNV entry mechanisms beyond previously known receptors such as HSP90ab1 and HSC70. The data, supported by comprehensive in vitro and in vivo experiments, strongly justify the authors' claims about MYL3's role in NNV infection in marine medaka.

      Strengths:

      (1) The identification of MmMYL3 as a functional receptor for RGNNV is a significant contribution to the field. The study fills a crucial gap in understanding the molecular mechanisms governing NNV entry into host cells.

      (2) The work highlights the involvement of IGF1R in macropinocytosis-mediated NNV entry and downstream Rac1/Cdc42 activation, thus providing a thorough mechanistic understanding of NNV internalization process. This could pave the way for further exploration of antiviral targets.

      Thanks for your review.

      Reviewer #3 (Public review):

      Summary:

      The manuscript presents a detailed study on the role of MmMYL3 in the viral entry of NNV, focusing on its function as a receptor that mediates viral internalization through the macropinocytosis pathway. The use of both in vitro assays (e.g., Co-IP, SPR, and GST pull-down) and in vivo experiments (such as infection assays in marine medaka) adds robustness to the evidence for MmMYL3 as a novel receptor for RGNNV. The findings have important implications for understanding NNV infection mechanisms, which could pave the way for new antiviral strategies in aquaculture.

      Strengths:

      The authors show that MmMYL3 directly binds the viral capsid protein, facilitates NNV entry via the IGF1R-Rac1/Cdc42 pathway, and can render otherwise resistant cells susceptible to infection. This multifaceted approach effectively demonstrates the central role of MmMYL3 in NNV entry.

      Thanks for your review.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Line94: SPR analysis? The full name should be provided when it first shows.

      We have defined SPR when it first appears at line 97 in the revised manuscript.

      (2) Moreover, is it too many for a manuscript to have a total of nine figures in the main text? Some of them might be moved to the supplementary file.

      We have merged the previous Fig 4 and Fig 5 and combined Fig 8 and Fig 9, reducing the number of figures to seven. For the specific details of the figure adjustments, please refer to the corresponding figure legends.

      Reviewer #2 (Recommendations for the authors):

      (1) Expand on the potential therapeutic implications of targeting MYL3 or the IGF1R pathway in aquaculture settings. Including a discussion of how inhibitors could be developed or tested in future research would give practical context to the findings.

      Thanks for your valuable suggestion to expand on the therapeutic implications of targeting MYL3 and the IGF1R pathway in aquaculture. In response, we have discussed potential strategies for developing inhibitors, such as small molecules, peptides, or monoclonal antibodies targeting MYL3 to block its interaction with the viral capsid, and IGF1R inhibitors to prevent macropinocytosis-mediated viral entry. We propose using virtual screening platforms to identify these inhibitors, followed by in vivo testing in aquaculture models. Additionally, combining MYL3 and IGF1R inhibitors could provide a synergistic approach to enhance antiviral efficacy. The relevant discussions have been supplemented at lines 358 to 368 in the revised manuscript.

      (2) It is recommended to include the data regarding the lack of interaction between the CMNV CP and MmMYL3 as a supplementary figure.

      We have included supplementary data demonstrating that CMNV CP does not interact with MmMYL3, highlighting the specificity of MYL3 for RGNNV. For detailed information, please refer to Fig. S4.

      Reviewer #3 (Recommendations for the authors):

      Consider discussing the broader implications of these findings, particularly whether MYL3 might serve as a receptor for other viruses.

      We appreciate this suggestion. It is important to note that viral receptors typically exhibit specificity for specific types of viruses. Receptor recognition is typically highly specific, and the binding interactions between viral proteins and host receptors often depend on the structural compatibility between the viral capsid/ viral envelope and the host receptor. Our study demonstrates that MYL3 serves as a receptor for NNV based on its direct interaction with the NNV capsid protein (CP). However, when we tested whether MYL3 interacts with CMNV (Covert Mortality Nodavirus), which is phylogenetically closer to NNV, we found that CMNV CP does not bind to MYL3. Given the lack of interaction between MYL3 and CMNV, it is unlikely that MYL3 serves as a receptor for more distantly related viruses. Since MYL3 does not interact with CMNV, a virus more closely related to NNV, it is less likely to function as a receptor for viruses that are more distantly related to NNV. The relevant discussions have been supplemented at lines 306 to 310 in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Diarrheal diseases represent an important public health issue. Among the many pathogens that contribute to this problem, Salmonella enterica serovar Typhimurium is an important one. Due to the rise in antimicrobial resistance and the problems associated with widespread antibiotic use, the discovery and development of new strategies to combat bacterial infections is urgently needed. The microbiome field is constantly providing us with various health-related properties elicited by the commensals that inhabit their mammalian hosts. Harnessing the potential of these commensals for knowledge about host-microbe interactions as well as useful properties with therapeutic implications will likely remain a fruitful field for decades to come. In this manuscript, Wang et al use various methods, encompassing classic microbiology, genomics, chemical biology, and immunology, to identify a potent probiotic strain that protects nematode and murine hosts from S. enterica infection. Additionally, authors identify gut metabolites that are correlated with protection, and show that a single metabolite can recapitulate the effects of probiotic administration.

      We gratefully appreciate your positive and professional comments.

      Strengths:

      The utilization of varied methods by the authors, together with the impressive amount of data generated, to support the claims and conclusions made in the manuscript is a major strength of the work. Also, the ability to move beyond simple identification of the active probiotic, also identifying compounds that are at least partially responsible for the protective effects, is commendable.

      We gratefully appreciate your positive and professional comments.

      Weaknesses:

      Although there is a sizeable amount of data reported in the manuscript, there seems to be a chronic issue of lack of details of how some experiments were performed. This is particularly true in the figure legends, which for the most part lack enough details to allow comprehension without constant return to the text. Additionally, 2 figures are missing. Figure 6 is a repetition of Figure 5, and Figure S4 is an identical replicate of Figure S3.

      We gratefully appreciate your professional comments. Additional details to perform the related experiments had been added in Materials and methods section and figure legends (e.g., see Line 478-487, Line 996-1001, Line 1010-1012, Line 1019-1020, Line 1031-1033, Line 1041-1042, Line 1051-1053, Line 1082-1083, Line 1087-1088, Line 1093-1094, Line 1105-1107, Line 1113-1114,). Furthermore, we sincerely apologize for the mistakes and the inconvenience in the evaluating process of your review, and we have added the correct Figure 6 (see Line 1043-1053) and Figure S4 (see Line 1084-1088). We will carefully and thoroughly check the whole submitted manuscript along with supplementary information to avoid such mistakes in the future.

      Reviewer #2 (Public review):

      In this work, the investigators isolated one Lacticaseibacillus rhamnosus strain (P118), and determined this strain worked well against Salmonella Typhimurium infection. Then, further studies were performed to identify the mechanism of bacterial resistance, and a list of confirmatory assays was carried out to test the hypothesis.

      We gratefully appreciate your positive and professional comments.

      Strengths:

      The authors provided details regarding all assays performed in this work, and this reviewer trusted that the conclusion in this manuscript is solid. I appreciate the efforts of the authors to perform different types of in vivo and in vitro studies to confirm the hypothesis.

      We gratefully appreciate your positive and professional comments.

      Weaknesses:

      I have two main questions about this work.

      (1) The authors provided the below information about the sources from which Lacticaseibacillus rhamnosus was isolated. More details are needed. What are the criteria to choose these samples? Where did these samples originate from? How many strains of bacteria were obtained from which types of samples?

      Sorry for the ambiguous and limited information, more details had been added in Materials and methods section (see Line 480-496). We gratefully appreciate your professional comments.

      Lines 486-488: Lactic acid bacteria (LAB) and Enterococcus strains were isolated from the fermented yoghurts collected from families in multiple cities of China and the intestinal contents from healthy piglets without pathogen infection and diarrhoea by our lab.

      Sorry for the ambiguous and limited information, we had carefully revised this section and more details had been added in Materials and methods section (see Line 480-496). We gratefully appreciate your professional comments.

      Lines 129-133: A total of 290 bacterial strains were isolated and identified from 32 samples of the fermented yoghurt and piglet rectal contents collected across diverse regions within China using MRS and BHI medium, which consist s of 63 Streptococcus strains, 158 Lactobacillus/ Lacticaseibacillus Limosilactobacillus strains, and 69 Enterococcus strains.

      Sorry for the ambiguous information, we had carefully revised this section and more details had been added in this section (see Line 129-132). We gratefully appreciate your professional comments.

      (2) As a probiotic, Lacticaseibacillus rhamnosus has been widely studied. In fact, there are many commercially available products, and Lacticaseibacillus rhamnosus is the main bacteria in these products. There are also ATCC type strains such as 53103.

      I am sure the authors are also interested to know whether P118 is better as a probiotic candidate than other commercially available strains. Also, would the mechanism described for P118 apply to other Lacticaseibacillus rhamnosus strains?

      It would be ideal if the authors could include one or two Lacticaseibacillus rhamnosus which are currently commercially used, or from the ATCC. Then, the authors can compare the efficacy and antibacterial mechanisms of their P118 with other strains. This would open the windows for future work.

      We gratefully appreciate your professional comments and valuable suggestions. We deeply agree that it will be better and make more sense to include well-known/recognized/commercial probiotics as a positive control to comprehensively evaluate the isolated P118 strain as a probiotic candidate, particularly in comparison to other well-established probiotics, and also help assess whether the mechanisms described for P118 are applicable to other L. rhamnosus strains or lactic acid bacteria in general. Those issues will be fully taken into consideration and included in the further works.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Line 28 - The sentence "with great probiotic properties" suggests that this strain was already known to have probiotic properties. Is that the case?

      We gratefully appreciate your professional comments. This sentence "with great probiotic properties" in this context was intended as a summary of our findings, emphasizing that L. rhamnosus P118 exerts great probiotic properties after evaluating by traditional and C. elegans-infection screening strategies. We had revised this sentence (see Line27-30).

      (2) Line 30 - What exactly do authors mean by "traditional"? They should add a bit more information here as to what these methods would be.

      We gratefully appreciate your professional comments. By "traditional" methods, we refer to time-consuming and labor-intensive strategies for screening probiotic candidates with heavy works, which include bacterial isolation, culturing, phenotypic characterization, randomized controlled trials, and various in vitro and in vivo tests to assess probiotic properties (Sun et al., 2022). We had indicated this strategy in Line 91-94.

      Reference:

      Sun Y, Li HC, Zheng L, Li JZ, Hong Y, Liang PF, Kwok LY, Zuo YC, Zhang WY, Zhang HP. Iprobiotics: A machine learning platform for rapid identification of probiotic properties from whole-genome primary sequences. Briefings in Bioinformatics 2022;23.

      (3) Line 37 - I believe "harmful microbes" is not the correct term here. I suggest authors use "potentially harmful".

      Done as requested (see Line 36, 209, 212, 217, 381). We gratefully appreciate your valuable suggestions.

      (4) Line 75 - What exactly do authors mean by "irregular dietary consumption"?

      "irregular dietary consumption" means "irregular dietary habits" or " eating irregularly " or "abnormal eating behaviors". We had change to "irregular dietary habits" (see Line 76). We gratefully appreciate your professional comments.

      (5) Line 85 - What exactly do authors mean by "without residues in raw food products"?

      Here, "without residues in raw food products" means that probiotics barely remain in food animal products (e.g., meat, eggs, dairy) after dietary with probiotics in feeds by livestock and poultry. We gratefully appreciate your professional comments.

      (6) Line 86 - Please, give a specific example of yeast.

      Done as requested (see Line 85-86), “yeast (e.g., Saccharomyces boulardii, S. cerevisiae)”. We gratefully appreciate your valuable suggestions.

      (7) Line 112 - Lactobacillus reuteri should be written out, since this is the first time the species name appears in the main text.

      Done as requested (see Line 112). We gratefully appreciate your valuable suggestions.

      (8) Lines 115-118 - Please, rewrite for clarity.

      Done as requested (see Line 115-118). We gratefully appreciate your valuable suggestions.

      (9) Line 118 -Lacticaseibacillus rhamnosus should be written out, since this is the first time the species name appears in the main text.

      Done as requested (see Line 118). We gratefully appreciate your valuable suggestions.

      (10) Line 119 - Throughout the text authors make it seem like strain P118 was previously known. Is that the case? If yes, how was it isolated again? This should be briefly mentioned in the introduction.

      Sorry for the misunderstand caused by this statement, P118 strain was isolated and its probiotic properties were evaluated by our lab, not previously known, and we have revised this sentence (see Line 118-120). We gratefully appreciate your professional comments.

      (11) Line 131 - How were strains identified?

      Matrix-Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry (MALDI-TOF MS) method was employed to identify of bacterial species (He et al., 2022). This information was indicated in Materials and methods section (see Line 485-489). We gratefully appreciate your professional comments.

      Reference

      He D, Zeng W, Wang Y, Xing Y, Xiong K, Su N, Zhang C, Lu Y, Xing X. Isolation and characterization of novel peptides from fermented products of lactobacillus for ulcerative colitis prevention and treatment. Food Science and Human Wellness 2022;11:1464-74.

      (12) Figure 1 - Legend needs a lot more info. Where are legends to panels PQ? Also, some of the text is too small to read.

      Sorry for the limited info, we have revised Figure 1 legend and added more info (see Line 1000-1019), and we also provide vector graphic of Figure 1. We gratefully appreciate your professional comments.

      (13) Line 136 - All strains were screened and 27 strains were positive, right?

      Yes, all strains were screened and 27 strains were positive. We gratefully appreciate your professional comments.

      (14) Figure 2 - What do authors mean by "spleen index" and "liver index"? This should be described in more detail. Also, p values for 'a', 'b', 'ab' should be given.

      The organ index (spleen index, liver index) were calculated according to the formula: organ index = organ weight (g) / body weight (g) *1000, indicating in Materials and methods section (see Line 587-588). “Different lowercase letters ('a', 'b') indicate a significant difference (P < 0.05)” had been added in Line 1020-1029. We gratefully appreciate your professional comments.

      (15) Line 212-214 - Again, I suggest authors use "potentially harmful" and "potentially beneficial".

      Done as requested (see Line 36, 210, 213, 218, 383). We gratefully appreciate your valuable suggestions.

      (16) Figure 3 - Which groups were tested in panels CD? Is this based on color? Legends should be restated in panels or clearly marked in the legend.

      Sorry for this mistake, we have revised and added group info in Figure 3C-D (see Line 1013-1020). We gratefully appreciate your professional comments.

      (17) Figure 4 - Lacks details.

      Sorry for the mistakes, we have revised and added group info in Figure 4D-E and legend (see Line 1031-1037). We gratefully appreciate your professional comments.

      (18) Figure 6 - This is a repetition of Figure 5.

      Sorry for the mistakes, we have added the correct Figure 6 (see Line 1060-1070). We gratefully appreciate your professional comments.

      (19) Lines 329-330 - C. elegans does not "mimic" animal intestinal physiology.

      Sorry for the mistakes, we have revised this statement (see Line 139-142, 324-325). We gratefully appreciate your professional comments.

      (20) Lines 358 and 418 - What do authors mean by "metabolic dysfunction" and "metabolic disorder"? I assume they mean changes in fecal metabolites. However, these are terms that may have different interpretations in the field of human metabolism. Therefore, I would suggest that the authors specify that they mean changes in fecal metabolite profiles when using these terms.

      Sorry for the mistakes caused by this statement, we have revised this statement in the revised version (see Line 34-35, 122, 353-354, 413). We gratefully appreciate your professional comments.

      (21) Line 475 - What do authors mean by "superficial effects"?

      Sorry for the mistakes, we had change to “beneficial/protective effects” (see Line 469, Line 1074). We gratefully appreciate your professional comments.

      (22) Line 486 - Were all yogurts artisanal? Where were piglets from? How were samples collected? Feces, rectal swabs? Does the ethics statement at the end of the manuscript also cover work with piglets?

      Yes, all yogurts were artisanal. The 6 healthy piglet rectal content samples without pathogen infection and diarrhea were from a pig farm of Zhejiang province. Yes, the ethics statement at the end of the manuscript also cover the work with piglets.

      (23) Line 490 - Which MALDI platform was used? The database used can have important implications for strain identification. What was the confidence of ID? This should be included.

      Matrix-Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry (MALDI-TOF MS, Bruker Daltonik GmbH, Bremen, Germany) was employed to identify of bacterial species with a confidence level > 90%. This information was indicated in Materials and methods section (see Line 487-489). We gratefully appreciate your professional comments.

      (24) Line 501 - Is this a widely used method to characterize probiotics? Please, add a reference.

      Done as requested (see Line 498). Many probiotics or microbes can produce milk clotting enzyme to clot milk. It's an important measurement in the dairy industry, especially when making cheese (Zhang et al., 2023; Arbita et al., 2024; Shieh et al., 2009). The milk-clotting activity analysis is usually used for evaluating the potential ability of candidate probiotic isolates in clotting milk into cheeses.

      Reference:

      Zhang Y, Wang J, He J, Liu X, Sun J, Song X, Wu Y. Characteristics and application in cheese making of newly isolated milk-clotting enzyme from bacillus megaterium ly114. Food Res Int 2023;172:113202.

      Arbita AA, Zhao J. Milk clotting enzymes from marine resources and their role in cheese-making: A mini review. Crit Rev Food Sci Nutr. 2024;64(27):10036-10047.

      Chwen-Jen Shieh, Lan-Anh Phan Thi, Ing-Lung Shih. Milk-clotting enzymes produced by culture of Bacillus subtilis natto. Biochemical Engineering Journal. 2009;1(43): 85-91.

      (25) Line 713 - How were fecal metabolites extracted?

      Sorry for the missed information, the fecal metabolites extracted information had been added we have revised and added Materials and methods section (see Line 705-706). We gratefully appreciate your professional comments.

      (26) Figure 7 - Please correct "macrophages".

      Done as requested (see Figure 7, Line 1072). We gratefully appreciate your valuable suggestions.

      (27) Table 1 - Should read "number of strains", not size.

      Done as requested (see Line1084). We gratefully appreciate your valuable suggestions.

      (28) Figure S1B - Is this data for P118?

      Sorry for the mistakes, we have revised Figure S1 legend (see Line 1086-1088). We gratefully appreciate your professional comments.

      (29) Figure S3 - Legends C, S, PS, P are not specified.

      Sorry for the missed information, we have revised and added group info in Figure S3 legend (see Line 1095-1101). We gratefully appreciate your professional comments.

      (30) Figure S3B - What is the "clinical symptom score"? How was this determined?

      Sorry for the lack information, and the detailed information had been added in Materials and methods section (see Line 659-661, Table S7). We gratefully appreciate your professional comments.

      (31) Figure S4 - This is an identical copy of Figure S3.

      Sorry for the mistakes, we have added the correct Figure S4 (see Line 1103-1106). We gratefully appreciate your professional comments.

      (32) Figure S5 - Legend lacks details.

      Sorry for the missed information, we have revised and added group info in Figure S5 legend (see Line 1107-1112). We gratefully appreciate your professional comments.

      (33) Figure S8 - What is "GM"? Since it inhibits growth to a greater extent than the highest metabolite concentration used, I imagine it must be an antibiotic (gentamycin?) as a positive control. This needs to be clearly stated.

      Sorry for the missed information, GM: 100 μg/mL gentamicin (see Line 1134). We gratefully appreciate your professional comments.

      (34) Figure S9 - Labels for panels are missing.

      Sorry for the missed information, labels had been added (see Line 1135-1139). We gratefully appreciate your professional comments.

      Reviewer #2 (Recommendations for the authors):

      (1) This reviewer appreciates the efforts of the authors to provide the details related to this work. In the meantime, the manuscript shall be written in a way that is easy for the readers to follow.

      We had tried our best to revise and make improve the whole manuscript to make it easy for the readers to follow (e.g., see Line 27-30, Line 115-120, Line 129-132, Line 480-496). We gratefully appreciate your valuable suggestions.

      (2) For example, under the sections of Materials and Methods, there are 19 sub-titles. The authors could consider combining some sections, and/or citing other references for the standard procedures.

      We gratefully appreciate your professional comments and valuable suggestions. Some sections had been combined according to the reviewer’s suggestions (see Line 497-530, Line 637-671).

      (3) Another example: the figures have great resolution, but they are way too busy. Figures 1 and 2 have 14-18 panels. Figure 5 has 21 panels. Please consider separating into more figures, or condensing some panels.

      We deeply agree with you that some submitted figures are way too busy, but it’s not easy to move some results into supplementary information sections, because all of them are essential for fully supporting our hypothesis and conclusions. Nonetheless, some panels had been combined or condensed according to the reviewer’s suggestions (see Line 1000-1020, Line 1052-1071). We gratefully appreciate your professional comments and valuable suggestions.

      (4) Line 30: spell out "C." please.

      Done as requested (see Line 31). We gratefully appreciate your valuable suggestions.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This valuable work explores how synaptic activity encodes information during memory tasks. All reviewers agree that the quality of the work is high. Although experimental data do support the possibility that phospholipase diacylglycerol signaling and synaptotagmin 7 (Syt7) dynamically regulate the vesicle pool required for presynaptic release, concerns remain that the central finding of paired pulse depression at very short intervals was more likely caused by Ca<sup>2+</sup> channel inactivation than pool depletion. Overall, this is a solid study with valuable findings, but the results warrant consideration of alternative interpretations.

      We greatly appreciate invaluable and constructive comments from Editors and Reviewers. We also thank for their time and patience. We are pleased for our manuscript to have been assessed valuable and solid.

      One of the most critical concerns was a possible involvement of Ca<sup>2+</sup> channel inactivation in the strong paired pulse depression (PPD). Meanwhile, we have measured total (free plus buffered) calcium increments induced by each of first four APs in 40 Hz trains at axonal boutons of prelimbic layer 2/3 pyramidal cells. We found that first four Ca<sup>2+</sup> increments were not different from one another, arguing against possible contribution of Ca<sup>2+</sup> channel inactivation to PPD. Please see our reply to the 2nd issue in the Weakness section of Reviewer #3.

      The second critical issue was on the definition of ‘vesicular probability’. Previously, vesicular probability (p<sub>v</sub>) has been used with reference to the releasable vesicle pool which includes not only tightly docked vesicles but also reluctant vesicles. On the other hand, the meaning of p<sub>v</sub> in the present study is the release probability of tightly docked vesicles. We clarified this point in our replies to the 1st issues in the Weakness sections of Reviewer #2 and Reviewer #3.

      We below described our point-by-point replies to the Reviewers’ comments.

      Public Reviews:

      Reviewer #1 (Public review):

      Shin et al. conduct extensive electrophysiological and behavioral experiments to study the mechanisms of short-term synaptic plasticity at excitatory synapses in layer 2/3 of the rat medial prefrontal cortex. The authors interestingly find that short-term facilitation is driven by progressive overfilling of the readily releasable pool, and that this process is mediated by phospholipase C/diacylglycerol signaling and synaptotagmin-7 (Syt7). Specifically, knockdown of Syt7 not only abolishes the refilling rate of vesicles with high fusion probability, but it also impairs the acquisition of trace fear memory. Overall, the authors offer novel insight to the field of synaptic plasticity through well-designed experiments that incorporate a range of techniques.

      Reviewer #2 (Public review):

      Summary:

      Shin et al aim to identify in a very extensive piece of work a mechanism that contributes to dynamic regulation of synaptic output in the rat cortex at the second time scale. This mechanism is related to a new powerful model is well versed to test if the pool of SV ready for fusion is dynamically scaled to adjust supply demand aspects. The methods applied are state-of-the-art and both address quantitative aspects with high signal to noise. In addition, the authors examine both excitatory output onto glutamatergic and GABAergic neurons, which provides important information on how general the observed signals are in neural networks, The results are compellingly clear and show that pool regulation may be predominantly responsible. Their results suggests that a regulation of release probability, the alternative contender for regulation, is unlikely to be involved in the observed short term plasticity behavior (but see below). Besides providing a clear analysis pof the underlying physiology, they test two molecular contenders for the observed mechanism by showing that loss of Synaptotagmin7 function and the role of the Ca dependent phospholipase activity seems critical for the short term plasticity behavior. The authors go on to test the in vivo role of the mechanism by modulating Syt7 function and examining working memory tasks as well as overall changes in network activity using immediate early gene activity. Finally, they model their data, providing strong support for their interpretation of TS pool occupancy regulation.

      Strengths:

      This is a very thorough study, addressing the research question from many different angles and the experimental execution is superb. The impact of the work is high, as it applies recent models of short term plasticity behavior to in vivo circuits further providing insights how synapses provide dynamic control to enable working memory related behavior through nonpermanent changes in synaptic output.

      Weaknesses:

      (1) While this work is carefully examined and the results are presented and discussed in a detailed manner, the reviewer is still not fully convinced that regulation of release provability is not a putative contributor to the observed behavior. No additional work is needed but in the moment I am not convinced that changes in release probability are not in play. One solution may be to extend the discussion of changes in release probability as an alternative.

      Quantal content (m) depends on n * p<sub>v</sub>, where n = RRP size and p<sub>v</sub> =vesicular release probability. The value for p<sub>v</sub> critically depends on the definition of RRP size. Recent studies revealed that docked vesicles have differential priming states: loosely or tightly docked state (LS or TS, respectively). Because the RRP size estimated by hypertonic solution or long presynaptic depolarization is larger than that by back extrapolation of a cumulative EPSC plot (Moulder & Mennerick, 2005; Sakaba, 2006) in glutamatergic synapses, the former RRP (denoted as RRP<sub>hyper</sub>) may encompass not only AP-evoked fast-releasing vesicles (TS vesicle) but also reluctant vesicles (LS vesicles). Because we measured p<sub>v</sub> based on AP-evoked EPSCs such as strong paired pulse depression (PPD) and associated failure rates, p<sub>v</sub> in the present study denotes vesicular fusion probability of TS vesicles, not that of LS plus TS vesicles.

      Recent studies suggest that release sites are not fully occupied by TS vesicles in the baseline (Miki et al., 2016; Pulido and Marty, 2018; Malagon et al., 2020; Lin et al., 2022). Instead, the occupancy (p<sub>occ</sub>) by TS vesicles is subject to dynamic regulation by reversible rate constants (denoted by k<sub>1</sub> and b<sub>1</sub>, respectively). The number of TS vesicles (n) can be factored into the number of release sites (N) and p<sub>occ</sub>, among which N is a fixed parameter but p<sub>occ</sub> depends on k<sub>1</sub>/(k<sub>1</sub>+b<sub>1</sub>) under the framework of the simple refilling model (see Methods). Because these refilling rate constants are regulated by Ca<sup>2+</sup> (Hosoi, et al., 2008), p<sub>occ</sub> is not a fixed parameter. Therefore, release probability should be re-defined as p<sub>occ</sub> * p<sub>v</sub>. Given that N is fixed, the increase in release probability is a major player in STF. Our study asserts that STF by 2.3 times can be attributed to an increase in p<sub>occ</sub> rather than p<sub>v</sub>, because p<sub>v</sub> is close to unity (Fig. S8). Moreover, strong PPD was observed not only in the baseline but also at the early and in the middle of a train (Fig. 2 and 7) and during the recovery phase (Fig. 3), arguing against a gradual increase in p<sub>v</sub> of reluctant vesicles.

      We imagine that the Reviewer meant vesicular release or fusion probability (p<sub>v</sub>) by ‘release probability’. If so, p<sub>v</sub> (of TS vesicles) cannot be a major player in STF, because the baseline p<sub>v</sub> is already higher than 0.8 even if it is most parsimoniously estimated (Fig. 2). Moreover, considering very high refilling rate (23/s), the high double failure rate cannot be explained without assuming that p<sub>v</sub> is close to unity (Fig. S8).

      Conventional models for facilitation assume a post-AP residual Ca<sup>2+</sup>-dependent step increase in p<sub>v</sub> of RRP (Dittman et al., 2000) or reluctant vesicles (Turecek et al., 2016). Given that p<sub>v</sub> of TS vesicles is close to one, an increase in p<sub>v</sub> of TS vesicles cannot account for facilitation. The possibility for activity-dependent increase in fusion probability of LS vesicles (denoted as p<sub>v,LS</sub>) should be considered in two ways depending on whether LS and TS vesicles reside in distinct pools or in the same pool. Notably, strong PPD at short ISI implies that p<sub>v,LS</sub> is near zero at the resting state. Whereas LS vesicles do not contribute to baseline transmission, short-term facilitation (STF) may be mediated by cumulative increase in p<sub>v v,LS </sub> that reside in a distinct pool. Because the increase in p<sub>v,LS</sub> during facilitation recruits new release sites (increase in N), the variance of EPSCs should become larger as stimulation frequency increases, resulting in upward deviation from a parabola in the V-M plane, as shown in recent studies (Valera et al., 2012; Kobbersmed et al., 2020). This prediction is not compatible with our results of V-M analysis (Fig. 3), showing that EPSCs during STF fell on the same parabola regardless of stimulation frequencies. Therefore, it is unlikely that an increase in fusion probability of reluctant vesicles residing in a distinct release pool mediates STF in the present study.

      For the latter case, in which LS and TS vesicles occupy in the same release sites, it is hard to distinguish a step increase in fusion probability of LS vesicles from a conversion of LS vesicles to TS. Nevertheless, our results do not support the possibility for gradual increase in p<sub>v,LS</sub> that occurs in parallel with STF. Strong PPD, indicative of high p<sub>v</sub>, was consistently found not only in the baseline (Fig. 2 and Fig. S6) but also during post-tetanic augmentation phase (Fig. 3D) and even during the early development of facilitation (Fig. 2D-E and Fig. 7), arguing against gradual increase in p<sub>v,LS</sub>. One may argue that STF may be mediated by a drastic step increase of p<sub>v,LS</sub> from zero to one, but it is not distinguishable from conversion of LS to TS vesicles.

      To address the reviewer’s concern, we incorporated these perspectives into Discussion and further clarified the reasoning behind our conclusions.

      References

      Moulder KL, Mennerick S (2005) Reluctant vesicles contribute to the total readily releasable pool in glutamatergic hippocampal neurons. J Neurosci 25:3842–3850.

      Sakaba, T (2006) Roles of the fast-releasing and the slowly releasing vesicles in synaptic transmission at the calyx of Held. J Neurosci 26(22): 5863-5871.

      Please note that papers cited in the manuscript are not repeated here.

      (2) Fig 3 I am confused about the interpretation of the Mean Variance analysis outcome. Since the data points follow the curve during induction of short term plasticity, aren't these suggesting that release probability and not the pool size increases? Related, to measure the absolute release probability and failure rate using the optogenetic stimulation technique is not trivial as the experimental paradigm bias the experiment to a given output strength, and therefore a change in release probability cannot be excluded.

      Under the recent definition of release probability, it can be factored into p<sub>v</sub> and p<sub>occ</sub>, which are fusion probability of TS vesicles and the occupancy of release sites by TS vesicles, respectively. With this regard, our interpretation of the Variance-Mean results is consistent with conventional one: different data points along a parabola represent a change in release probability (= p<sub>occ</sub> x p<sub>v</sub>). Our novel finding is that the increase in release probability should be attributed to an increase in p<sub>occ</sub>, not to that in p<sub>v</sub>.

      (3) Fig4B interprets the phorbol ester stimulation to be the result of pool overfilling, however, phorbol ester stimulation has also been shown to increase release probability without changing the size of the readily releasable pool. The high frequency of stimulation may occlude an increased paired pulse depression in presence of OAG, which others have interpreted in mammalian synapses as an increase in release probability.

      To our experience in the calyx of Held synapses, OAG, a DAG analogue, increased the fast releasing vesicle pool (FRP) size (Lee JS et al., 2013), consistent with our interpretation (pool overfilling). Once the release sites are overfilled in the presence of OAG, it is expected that the maximal STF (ratio of facilitated to baseline EPSCs) becomes lower as long as the number of release sites (N) are limited. As aforementioned, the baseline p<sub>v</sub> is already close to one, and thus it cannot be further increased by OAG. Instead, the baseline p<sub>occ</sub> seems to be increased by OAG.

      Reference

      Lee JS, et al., Superpriming of synaptic vesicles after their recruitment to the readily releasable pool. Proc Natl Acad Sci U S A, 2013. 110(37): 15079-84.

      (4) The literature on Syt7 function is still quite controversial. An observation in the literature that loss of Syt7 function in the fly synapse leads to an increase of release probability. Thus the observed changes in short term plasticity characteristics in the Syt7 KD experiments may contain a release probability component. Can the authors really exclude this possibility? Figure 5 shows for the Syt7 KD group a very prominent depression of the EPSC/IPSC with the second stimulus, particularly for the short interpulse intervals, usually a strong sign of increased release probability, as lack of pool refilling can unlikely explain the strong drop in synaptic output.

      The reviewer raises an interesting point regarding the potential link between Syt7 KD and increased initial p<sub>v</sub>, particularly in light of observations in Drosophila synapses (Guan et al., 2020; Fujii et al., 2021), in which Syt7 mutants exhibited elevated initial p<sub>v</sub>. However, it is important to note that these findings markedly differ from those in mammalian systems, where the role of Syt7 in regulating initial p<sub>v</sub> has been extensively studied. In rodents, consistent evidence indicates that Syt7 does not significantly affect initial p<sub>v</sub>, as demonstrated in several studies (Jackman et al., 2016; Chen et al., 2017; Turecek and Regehr, 2018). Furthermore, in our study of excitatory synapses in the mPFC layer 2/3, we observed an initial p<sub>v</sub> already near its maximal level, approaching a value of 1. Consequently, it is unlikely that the loss of Syt7 could further elevate the initial p<sub>v</sub>. Instead, such effects are more plausibly explained by alternative mechanisms, such as alterations in vesicle replenishment dynamics, rather than a direct influence on p<sub>v</sub>.

      References

      Chen, C., et al., Triple Function of Synaptotagmin 7 Ensures Efficiency of High-Frequency Transmission at Central GABAergic Synapses. Cell Rep, 2017. 21(8): 2082-2089.

      Fujii, T., et al., Synaptotagmin 7 switches short-term synaptic plasticity from depression to facilitation by suppressing synaptic transmission. Scientific reports, 2021. 11(1): 4059.

      Guan, Z., et al., Drosophila Synaptotagmin 7 negatively regulates synaptic vesicle release and replenishment in a dosage-dependent manner. Elife, 2020. 9: e55443.

      Jackman, S.L., et al., The calcium sensor synaptotagmin 7 is required for synaptic facilitation. Nature, 2016. 529(7584): 88-91.

      Turecek, J. and W.G. Regehr, Synaptotagmin 7 mediates both facilitation and asynchronous release at granule cell synapses. Journal of Neuroscience, 2018. 38(13): 3240-3251.

      Reviewer #3 (Public review):

      Summary:

      The report by Shin, Lee, Kim, and Lee entitled "Progressive overfilling of readily releasable pool underlies short-term facilitation at recurrent excitatory synapses in layer 2/3 of the rat prefrontal cortex" describes electrophysiological experiments of short-term synaptic plasticity during repetitive presynaptic stimulation at synapses between layer 2/3 pyramidal neurons and nearby target neurons. Manipulations include pharmacological inhibition of PLC and actin polymerization, activation of DAG receptors, and shRNA knockdown of Syt7. The results are interpreted as support for the hypothesis that synaptic vesicle release sites are vacant most of the time at resting synapses (i.e., p_occ is low) and that facilitation (and augmentation) components of short-term enhancement are caused by an increase in occupancy, presumably because of acceleration of the transition from not-occupied to occupied. The report additionally describes behavioural experiments where trace fear conditioning is degraded by knocking down syt7 in the same synapses.

      Strengths:

      The strength of the study is in the new information about short-term plasticity at local synapses in layer 2/3, and the major disruption of a memory task after eliminating short-term enhancement at only 15% of excitatory synapses in a single layer of a small brain region. The local synapses in layer 2/3 were previously difficult to study, but the authors have overcome a number of challenges by combining channel rhodopsins with in vitro electroporation, which is an impressive technical advance.

      Weaknesses:

      (1) The question of whether or not short-term enhancement causes an increase in p_occ (i.e., "readily releasable pool overfilling") is important because it cuts to the heart of the ongoing debate about how to model short term synaptic plasticity in general. However, my opinion is that, in their current form, the results do not constitute strong support for an increase in p_occ, even though this is presented as the main conclusion. Instead, there are at least two alternative explanations for the results that both seem more likely. Neither alternative is acknowledged in the present version of the report.

      The evidence presented to support overfilling is essentially two-fold. The first is strong paired pulse depression of synaptic strength when the interval between action potentials is 20 or 25 ms, but not when the interval is 50 ms. Subsequent stimuli at frequencies between 5 and 40 Hz then drive enhancement. The second is the observation that a slow component of recovery from depression after trains of action potentials is unveiled after eliminating enhancement by knocking down syt7. Of the two, the second is predicted by essentially all models where enhancement mechanisms operate independently of release site depletion - i.e., transient increases in p_occ, p_v, or even N - so isn't the sort of support that would distinguish the hypothesis from alternatives (Garcia-Perez and Wesseling, 2008, https://doi.org/10.1152/jn.01348.2007).

      The apparent discrepancy in interpretation of post-tetanic augmentation between the present and previous papers [Sevens Wesseling (1999), Garcia-Perez and Wesseling (2008)] is an important issue that should be clarified. We noted that different meanings of ‘vesicular release probability’ in these papers are responsible for the discrepancy. We added an explanation to Discussion on the difference in the meaning of ‘vesicular release probability’ between the present study and previous studies [Sevens Wesseling (1999), Garcia-Perez and Wesseling (2008)]. In summary, the p<sub>v</sub> in the present study was used for vesicular release probability of TS vesicles, while previous studies used it as vesicular release probability of vesicles in the RRP, which include LS and TS vesicles. Accordingly, p<sub>occ</sub> in the present study is the occupancy of release sites by TS vesicles.

      Not only double failure rate but also other failure rates upon paired pulse stimulation were best fitted at p<sub>v</sub> close to 1 (Fig. S8 and associated text). Moreover, strong PPD, indicating release of vesicles with high p<sub>v</sub>, was observed not only at the beginning of a train but also in the middle of a 5 Hz train (Fig. 2D), during the augmentation phase after a 40 Hz train (Fig 3D), and in the recovery phase after three pulse bursts (Fig. 7). Given that p<sub>v</sub> is close to 1 throughout the EPSC trains and that N does not increase during a train (Fig. 3), synaptic facilitation can be attained only by the increase in p<sub>occ</sub> (occupancy of release sites by TS vesicles). In addition, it should be noted that Fig. 7 demonstrates strong PPD during the recovery phase after depletion of TS vesicles by three pulse bursts, indicating that recovered vesicles after depletion display high p<sub>v</sub> too. Knock-down of Syt7 slowed the recovery of TS vesicles after depletion of TS vesicles, highlighting that Syt7 accelerates the recovery of TS vesicles following their depletion.

      As addressed in our reply to the first issue raised by Reviewer #2 and the third issue raised by Reviewer #3, our results do not support possibilities for recruitment of new release sites (increase in N) having low p<sub>v</sub> or for a gradual increase in p<sub>v</sub> of reluctant vesicles during short-term facilitation.  

      Following statement was added to Discussion in the revised manuscript

      “Previous studies suggested that an increase in p<sub>v</sub> is responsible for post-tetanic augmentation (Stevens and Wesseling, 1999; Garcia-Perez and Wesseling, 2008) by observing invariance of the RRP size after tetanic stimulation. In these studies, the RRP size was estimated by hypertonic sucrose solution or as the sum of EPSCs evoked 20 Hz/60 pulses train (denoted as ‘RRP<sub>hyper</sub>’). Because reluctant vesicles (called LS vesicles) can be quickly converted to TS vesicles (16/s) and are released during a train (Lee et al., 2012), it is likely that the RRP size measured by these methods encompasses both LS and TS vesicles. In contrast, we assert high p<sub>v</sub> based on the observation of strong PPD and failure rates upon paired stimulations at ISI of 20 ms (Fig. 2 and Fig. S8). Given that single AP-induced vesicular release occurs from TS vesicles but not from LS vesicles, p<sub>v</sub> in the present study indicates the fusion probability of TS vesicles. From the same reasons, p<sub>occ</sub> denotes the occupancy of release sites by TS vesicles. Note that our study does not provide direct clue whether release sites are occupied by LS vesicles that are not tapped by a single AP, although an increase in the LS vesicle number may accelerate the recovery of TS vesicles. As suggested in Neher (2024), even if the number of LS plus TS vesicles are kept constant, an increase in p<sub>occ</sub> (occupancy by TS vesicles) would be interpreted as an increase in ‘vesicular release probability’ as in the previous studies (Stevens and Wesseling (1999); Garcia-Perez and Wesseling (2008)) as long as it was measured based on RRP<sub>hyper</sub>.”

      (2) Regarding the paired pulse depression: The authors ascribe this to depletion of a homogeneous population of release sites, all with similar p_v. However, the details fit better with the alternative hypothesis that the depression is instead caused by quickly reversing inactivation of Ca<sup>2+</sup> channels near release sites, as proposed by Dobrunz and Stevens to explain a similar phenomenon at a different type of synapse (1997, PNAS, https://doi.org/10.1073/pnas.94.26.14843). The details that fit better with Ca<sup>2+</sup> channel inactivation include the combination of the sigmoid time course of the recovery from depression (plotted backwards in Fig1G,I) and observations that EGTA (Fig2B) increases the paired-pulse depression seen after 25 ms intervals. That is, the authors ascribe the sigmoid recovery to a delay in the activation of the facilitation mechanism, but the increased paired pulse depression after loading EGTA indicates, instead, that the facilitation mechanism has already caused p_r to double within the first 25 ms (relative to the value if the facilitation mechanism was not active). Meanwhile, Ca<sup>2+</sup> channel inactivation would be expected to cause a sigmoidal recovery of synaptic strength because of the sigmoidal relationship between Ca<sup>2+</sup>-influx and exocytosis (Dodge and Rahamimoff, 1967, https://doi.org/10.1113/jphysiol.1967.sp008367).

      The Ca<sup>2+</sup>-channel inactivation hypothesis could probably be ruled in or out with experiments analogous to the 1997 Dobrunz study, except after lowering extracellular Ca<sup>2+</sup> to the point where synaptic transmission failures are frequent. However, a possible complication might be a large increase in facilitation in low Ca<sup>2+</sup> (Fig2B of Stevens and Wesseling, 1999, https://doi.org/10.1016/s0896-6273(00)80685-6).

      We appreciate the reviewer's thoughtful comment regarding the potential role of Ca<sup>2+</sup> channel inactivation in the observed paired-pulse depression (PPD). As noted by the Reviewer, the Dobrunz and Stevens (1997) suggested that the high double failure rate at short ISIs in synapses exhibiting PPD can be attributed to Ca<sup>2+</sup> channel inactivation. This interpretation seems to be based on a premise that the number of RRP vesicles are not varied trial-by-trial. The number of TS vesicles, however, can be dynamically regulated depending on the parameters k<sub>1</sub> and b<sub>1</sub>, as shown in Fig. S8, implying that the high double failure rate at short ISIs cannot be solely attributed to Ca<sup>2+</sup> channel inactivation. Nevertheless, we acknowledge the possibility that Ca<sup>2+</sup> channel inactivation may contribute to PPD, and therefore, we have further investigated this possibility. Specifically, we measured action potential (AP)-evoked Ca<sup>2+</sup> transients at individual axonal boutons of layer 2/3 pyramidal cells in the mPFC using two-dye ratiometry techniques. Our analysis revealed no evidence for Ca<sup>2+</sup> channel inactivation during a 40 Hz train of APs. This finding indicates that voltage-gated Ca<sup>2+</sup> channel inactivation is unlikely to contribute to the pronounced PPD.

      Figure 2—figure supplement 2 shows how we measured the total Ca<sup>2+</sup> increments at axonal boutons. First we estimated endogenous Ca<sup>2+</sup>-binding ratio from analyses of single AP-induced Ca<sup>2+</sup> transients at different concentrations of Ca<sup>2+</sup> indicator dye (panels A to E). And then, using the Ca<sup>2+</sup> buffer properties, we converted free [Ca<sup>2+</sup>] amplitudes to total calcium increments for the first four AP-evoked Ca<sup>2+</sup> transients in a 40 Hz train (panels G-I). We incorporated these results into the revised version of our manuscript to provide evidence against the Ca<sup>2+</sup> channel inactivation.

      (3) On the other hand, even if the paired pulse depression is caused by depletion of release sites rather than Ca<sup>2+</sup>-channel inactivation, there does not seem to be any support for the critical assumption that all of the release sites have similar p_v. And indeed, there seems to be substantial emerging evidence from other studies for multiple types of release sites with 5 to 20-fold differences in p_v at a wide variety of synapse types (Maschi and Klyachko, eLife, 2020, https://doi.org/10.7554/elife.55210; Rodriguez Gotor et al, eLife, 2024, https://doi.org/10.7554/elife.88212 and refs. therein). If so, the paired pulse depression could be caused by depletion of release sites with high p_v, whereas the facilitation could occur at sites with much lower p_v that are still occupied. It might be possible to address this by eliminating assumptions about the distribution of p_v across release sites from the variance-mean analysis, but this seems difficult; simply showing how a few selected distributions wouldn't work - such as in standard multiple probability fluctuation analyses - wouldn't add much.

      We appreciate the reviewer’s insightful comments regarding the potential increase in p<sub>fusion</sub> of reluctant vesicles. It should be noted, however, that Maschi and Klyachko (2020) showed a distribution of release probability (p<sub>r</sub>) within a single active zone rather than a heterogeneity in p<sub>fusion</sub> of individual docked vesicles. Therefore both p<sub>occ</sub> and p<sub>v</sub> of TS vesicles would contribute to the p<sub>r</sub> distribution shown in Maschi and Klyachko (2020). 

      The Reviewer’s concern aligns closely with the first issue raised by Reviewer #2, to which we addressed in detail. Briefly, new release site may not be recruited during facilitation or post-tetanic augmentation, because variance of EPSCs during and after a train fell on the same parabola (Fig. 3). Secondly, strong PPD was observed not only in the baseline but also during early and late phases of facilitation, indicating that vesicles with very high p<sub>v</sub> contribute to EPSC throughout train stimulations (Fig. 2, 3, and 7). These findings argue against the possibilities for recruitment of new release sites harboring low p<sub>v</sub> vesicles and for a gradual increase in fusion probability of reluctant vesicles.

      To address the reviewers’ concern, we incorporated the perspectives into Discussion and further clarified the reasoning behind our conclusions.

      (4) In any case, the large increase - often 10-fold or more - in enhancement seen after lowering Ca<sup>2+</sup> below 0.25 mM at a broad range of synapses and neuro-muscular junctions noted above is a potent reason to be cautious about the LS/TS model. There is morphological evidence that the transitions from a loose to tight docking state (LS to TS) occur, and even that the timing is accelerated by activity. However, 10-fold enhancement would imply that at least 90 % of vesicles start off in the LS state, and this has not been reported. In addition, my understanding is that the reverse transition (TS to LS) is thought to occur within 10s of ms of the action potential, which is 10-fold too fast to account for the reversal of facilitation seen at the same synapses (Kusick et al, 2020, https://doi.org/10.1038/s41593-020-00716-1).

      As the Reviewer suggested, low external Ca<sup>2+</sup> concentration can lower release probability (p<sub>r</sub>). Given that both p<sub>v</sub> and p<sub>occ</sub> are regulated by [Ca<sup>2+</sup>]<sub>i</sub>, low external [Ca<sup>2+</sup>] may affect not only p<sub>v</sub> but also p<sub>occ</sub>, both of which would contribute to low p<sub>r</sub>. Under such conditions, it would be plausible that the baseline p<sub>r</sub> becomes much lower than 0.1 due to low p<sub>v</sub> and p<sub>occ</sub> (for instance, p<sub>v</sub> decreases from 1 to 0.5, and p<sub>occ</sub> from 0.3 to 0.1, then p<sub>r</sub> = 0.05), and then p<sub>r</sub> (= p<sub>v</sub> x p<sub>occ</sub>) has a room for an increase by a factor of ten (0.5, for example) by short-term facilitation as cytosolic [Ca<sup>2+</sup>] accumulates during a train.

      If p<sub>v</sub> is close to one, p<sub>r</sub> depends p<sub>occ</sub>, and thus facilitation depends on the number of TS vesicles just before arrival of each AP of a train. Thus, post-train recovery from facilitation would depend on restoration of equilibrium between TS and LS vesicles to the baseline. Even if transition between LS and TS vesicles is very fast (tens of ms), the equilibrium involved in de novo priming (reversible transitions between recycling vesicle pool and partially docked LS vesicles) seems to be much slower (13 s in Fig. 5A of Wu and Borst 1999). Thus, we can consider a two-step priming model (recycling pool -> LS -> TS), which is comprised of a slow 1st step (-> LS) and a fast 2nd step (-> TS). Under the framework of the two-step model, the slow 1st step (de novo priming step) is the rate limiting step regulating the development and recovery kinetics of facilitation. Given that on and off rate for Ca<sup>2+</sup> binding to Syt7 is slow, it is plausible that Syt7 may contribute to short-term facilitation (STF) by Ca<sup>2+</sup>-dependent acceleration of the 1st step (as shown in Fig. 9). During train stimulation, the number of LS vesicles would slowly accumulate in a Syt7 and Ca<sup>2+</sup>-dependent manner, and this increase in LS vesicles would shift LS/TS equilibrium towards TS, resulting in STF. After tetanic stimulation, the recovery kinetics from facilitation would be limited by slow recovery of LS vesicles.

      Reference

      Wu, L.-G. and Borst J.G.G. (1999) The reduced release probability of releasable vesicles during recovery from short-term synaptic depression. Neuron, 23(4): 821-832.

      Please note that papers cited in the manuscript are not repeated here.

      Individual points:

      (1) An additional problem with the overfilling hypothesis is that syt7 knockdown increases the estimate of p_occ extracted from the variance-mean analysis, which would imply a faster transition from unoccupied to occupied, and would consequently predict faster recovery from depression. However, recovery from depression seen in experiments was slower, not faster. Meanwhile, the apparent decrease in the estimate of N extracted from the mean-variance analysis is not anticipated by the authors' model, but fits well with alternatives where p_v varies extensively among release sites because release sites with low p_v would essentially be silent in the absence of facilitation.

      Slower recovery from depression observed in the Syt7 knockdown (KD) synapses (Fig. 7) may results from a deficiency in activity-dependent acceleration of TS vesicle recovery. Although basal occupancy was higher in the Syt7 KD synapses, this does not indicate a faster activity-dependent recovery.

      Higher baseline occupancy does not always imply faster recovery of PPR too. Actually PPR recovery was slower in Syt7 KD synapses than WT one (18.5 vs. 23/s). Under the framework of the simple refilling model (Fig. S8Aa), the baseline occupancy and PPR recovery rate are calculated as k<sub>1</sub> / (k<sub>1</sub> + b<sub>1</sub>) and (k<sub>1</sub> + b<sub>1</sub>), respectively. The baseline occupancy depends on k<sub>1</sub>/b<sub>1</sub>, while the PPR recovery on absolute values of k<sub>1</sub> and b<sub>1</sub>. Based on p<sub>occ</sub> and PPR recovery time constant of WT and KD synapses, we expect higher k<sub>1</sub>/b<sub>1</sub> but lower values for (k<sub>1</sub> + b<sub>1</sub>) in Syt7 KD synapses compared to WT ones.

      Lower release sites (N) in Syt7-KD synapses was not anticipated. As you suggested, such low N might be ascribed to little recruitment of release sites during a train in KD synapses. But our results do not support this model. If silent release sites are recruited during a train, the variance should upwardly deviate from the parabola predicted under a fixed N (Valera et al., 2012; Kobbersmed et al. 2020). Our result was not the case (Fig. 3). In the first version of the manuscript, we have argued against this possibility in line 203-208.

      As discussed in both the Results and Discussion sections, the baseline EPSC was unchanged by KD (Fig. S3) because of complementary changes in the number of docking sites and their baseline occupancy (Fig. 6). These findings suggest that Syt7 may be involved in maintaining additional vacant docking sites, which could be overfilled during facilitation. It remains to be determined whether the decrease in docking sites in Syt7 KD synapses is related to its specific localization of Syt7 at the plasma membrane of active zones, as proposed in previous studies (Sugita et al., 2001; Vevea et al., 2021).

      (2) Figure S4A: I like the TTX part of this control, but the 4-AP part needs a positive control to be meaningful (e.g., absence of TTX).

      The reason why we used 4-AP in the presence of TTX was to increase the length constant of axon fibers and to facilitate the conduction of local depolarization in the illumination area to axon terminals. The lack of EPSC in the presence of 4-AP and TTX indicates that illumination area is distant from axon terminals enough for optic stimulation-induced local depolarization not to evoke synaptic transmission. This methodology has been employed in previous studies including the work of Little and Carter (2013).

      Reference

      Little JP and Carter AG (2013) Synaptic mechanisms underlying strong reciprocal connectivity between the medial prefrontal cortex and basolateral amygdala. J Neurosci, 33(39): 15333-15342.

      (3) Line 251: At least some of the previous studies that concluded these drugs affect vesicle dynamics used logic that was based on some of the same assumptions that are problematic for the present study, so the reasoning is a bit circular.

      (4) Line 329 and Line 461: A similar problem with circularity for interpreting earlier syt7 studies.

      (Reply to #3 and #4) We selected the target molecules as candidates based on their well-characterized roles in vesicle dynamics, and aimed to investigate what aspects of STP are affected by these molecules in our experimental context. For example, we could find that the baseline p<sub>occ</sub> and short-term facilitation (STF) are enhanced by the baseline DAG level and train stimulation-induced PLC activation, respectively. Notably, the effect of dynasore informed us that slow site clearing is responsible for the late depression of 40 Hz train EPSC. The knock-down experiments also provided us with information on the critical role of Syt7 in replenishment of TS vesicles. These approaches do not deviate from standard scientific reasoning but rather builds upon prior knowledge to formulate and test hypotheses.

      Importantly, our conclusions do not rely solely on the assumption that altering the target molecule impacts synaptic transmission. Instead, our conclusions are derived from a comprehensive analysis of diverse outcomes obtained through both pharmacological and genetic manipulations. These interpretations align closely with prior literature, further validating our conclusions.

      Therefore, the use of established studies to guide candidate selection and the consistency of our findings with existing knowledge do not represent a logical circularity but rather a reinforcement of the proposed mechanism through converging lines of evidence.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Comments:

      (1) While the authors claim that Syt7-mediated facilitation is connected to the behavioral deficits they observed, this link is still somewhat speculative. This manuscript could benefit from further discussions of other alternative mechanisms to consider.

      We added following statement to Discussion of the revised manuscript:

      “The acquisition of trace fear memory was impaired by inhibition of persistent activity in mPFC during trace period (Gilmartin et al., 2013). The similar deficit observed in Syt7 KD animals is consistent with the hypothesis that STF provides bi-stable ensemble activity in a recurrent network (Mongillo et al., 2012). Nevertheless, alternative mechanisms may be responsible for the behavioral deficit. Not only recurrent network but also long-range loop between the mPFC and the mediodorsal (MD) thalamus play a critical role in maintaining persistent activity within the mPFC especially for a delay period longer than 10 s (Bolkan et al., 2017). Prefrontal L2/3 is heavily innervated by MD thalamus, and L2/3-PCs subsequently relay signals to L5 cortico-thalamic (CT) neurons (Collins et al., 2018). Given that L2/3 is an essential component of the PFC-thalamic loop, loss of STF at recurrent synapses between L2/3 PCs may lead to insufficient L2/3 inputs to L5 CT neurons and failure in the reverberant PFC-MD thalamic feedback loop. Therefore, not only L2/3 recurrent network but also its output to downstream network should be considered as a possible network mechanism underlying behavioral deficit caused by Syt7 KD L2/3.”

      (2) The authors mention that Syt7 contributes to persistent activity during working memory tasks but focus on using only a trace fear conditioning task. However, it would be interesting to see if their results are generalizable to other working memory tasks (i.e. a delayed alternation task).

      We thank to Reviewer for the insightful suggestion. Trace fear conditioning (tFC) shares behavioral properties with working memory (WM) tasks in that tFC is vulnerable to attentional distraction and to the load of WM task. In general WM tasks including delayed alternation tasks such as a T-maze task need persistent activity of ensemble neurons representing target-specific information among multiple choices. Different from such WM tasks, tFC is not appropriate to examine target-specific ensemble activity. Because it is not trivial to examine in vivo recordings in KD animals during delayed alternation tasks, it will be appropriate to study the effect of Syt7 KD in a separate study. 

      (3) The figure legend in Figure 6A and 6B mentions dotted lines and broken lines in the figure. However, this is confusing, and it is unclear as to what these lines are referring to in the figure.

      To avoid the confusion in the figure legend for Figure 6A and 6B, we corrected “dotted line” to " vertical broken line", and “broken lines” to “dashed parabolas”.

      (4) The manuscript can benefit from close reading and editing to catch typos and improve general readability (i.e. line 173: the word "are" is repeated twice).

      We corrected typographical errors throughout the manuscript and carefully read the manuscript to improve readability. A revised version reflecting these corrections has been prepared and will be resubmitted for your consideration.

      Reviewer #3 (Recommendations for the authors):

      The points in this section are all minor.

      (1) Line 44: Define release probability (p_r) more clearly. Authors use it to mean p<sub>v</sub>*p<sub>occ</sub>, but others routinely use it to mean p<sub>v</sub>*p<sub>occ</sub>*N.

      We understand that the Reviewer meant “others routinely use it to mean p<sub>v</sub>”. At this statement, we meant conventional definition of release probability, which is release probability among vesicles of RRP. We think that it is not appropriate to re-define release probability as p<sub>v</sub> * p<sub>occ</sub> in this first paragraph of Introduction. Therefore we clarified this issue in Discussion as we mentioned in our reply to the 1st weakness issue raised by Reviewer #3.   

      (2) Line 82: For clarity, define better what recurrent excitatory synapses are. It seems that synapses between L2/3 PCs and local targets may all be recurrent?

      Each of L2/3 and L5 of the prefrontal cortical layers harbors intralaminar recurrent excitatory synapses between pyramidal cells, called a recurrent network. Previous theoretical studies have proposed that a single layer recurrent network model can have bi-stable E/I balanced states (up- and down-states) if recurrent excitatory synapses display short-term facilitation (STF), and thus is able to temporally hold an information once external input shifts the network to the up-state. In this theory, synapses to local targets across layers are not considered and specific roles of L2/3 and L5 in working memory tasks are still elusive. For clarity, we added a statement at the beginning of the paragraph (line 82): “Each of layer 2/3 (L2/3) and layer 5 (L5) of neocortex displays intralaminar excitatory synapses between pyramidal cells comprising a recurrent network (Holmgren et al., 2003; Thomson and Lamy, 2007)”

      (3) Cite earlier studies of short-term synaptic plasticity at synapses between L2/3 pyramidal neurons and local targets in mPFC. If there are none, take more explicit credit for being first.

      As we mentioned in Introduction, previous studies on short-term plasticity (STP) at neocortical excitatory recurrent synapses have focused on synapses between L5 pyramidal cells (PCs) (Hemple et al. 2000; Wang et al. 2006; Morishima et al., 2011; Yoon et al., 2020). The local connectivity between L2/3 PCs in the somatosensory cortex has been elucidated by Homgren et al. (2003) and Ko et al. (2011). Although these study showed STP of EPSPs, it was at a fixed frequency or stimulus pattern at high external [Ca<sup>2+</sup>] (2 mM). There is a study on the frequency-dependence of STP of EPSP between L2/3-PCs (Feldmyer et al., 2006). Different from our study, Feldmyer et al., (2006) observed monotonous STD at all frequencies less than 50 Hz, but this study was done in the somatosensory cortex and at high external [Ca<sup>2+</sup>] (2 mM). To our knowledge, no previous study have investigated STP at recurrent excitatory synapses of L2/3 pyramidal cells of the mPFC especially at physiological external [Ca<sup>2+</sup>]. The present study, therefore, represents the first extensive investigation of STP at recurrent excitatory synapses in L2/3 of the mPFC under physiologically relevant external [Ca<sup>2+</sup>].

      References

      Feldmeyer D, Lubke J, Silver RA, Sakmann B (2002) Synaptic connections between layer 4 spiny neurone-layer 2/3 pyramidal cell pairs in juvenile rat barrel cortex: physiology and anatomy of interlaminar signalling within a cortical column. J Physiol 538:803-822.

      Holmgren C, Harkany T, Svennenfors B, Zilberter Y (2003) Pyramidal cell communication within local networks in layer 2/3 of rat neocortex. J Physiol 551:139-153.

      Ko H, Hofer SB, Pichler B, Buchanan KA, Sjöström PJ, Mrsic-Flogel TD (2011) Functional specificity of local synaptic connections in neocortical networks. Nature 473:87-91.

      Morishima M, Morita K, Kubota Y, Kawaguchi Y (2011) Highly differentiated projection-specific cortical subnetworks. Journal of Neuroscience 31:10380-10391.

      Wang Y, Markram H, Goodman PH, Berger TK, Ma J, Goldman-Rakic PS (2006) Heterogeneity in the pyramidal network of the medial prefrontal cortex. Nat Neurosci 9:534-542.

      (4) I couldn't figure out the significance of Figure S3. Perhaps this could be explained better.

      Optical minimal stimulation methods have not been previously documented in detail. This figure illustrates what parameters we should carefully examine in order to attain optical minimal stimulation, which hopefully stimulates a single afferent fiber. A single fiber stimulation by optical minimal stimulation is supported by the similarity of our estimate for the number of release sites (N) as the previous morphological estimate (Holler et al., 2021). For minimal stimulation, we used a collimated DMD-coupled LED was employed to restrict 470 nm illumination to a small and well-defined region within layer 2/3 of the prelimbic mPFC, and carefully adjusted the illumination radius such that one step smaller (by 1 μm) illumination results in failure to evoke EPSCs. Our typical illumination area ranged between 3–4 μm, as shown in Figure S3A. Under this minimal illumination area, we confirmed unimodal distributions for the EPSC parameters (amplitude, rise time, decay time and time to peak; Figure 3B-E). Otherwise, we excluded the recordings from analysis. We hope this explanation provides a clearer understanding of the figure's significance.

      (5) Note that CTZ seems to alter p_r at some synapses.

      We acknowledge that CTZ can increase release probability by blocking presynaptic K<sup>+</sup> currents. Indeed, Ishikawa and Takahashi (2001) reported that CTZ slowed the repolarizing phase of presynaptic action potentials and the frequency of miniature EPSCs in the calyx synapses. Consistently, we observed a slight increase in the baseline EPSC amplitude, from 33.3 pA to 41.9 pA (p=0.045) following the application of 50 µM CTZ. However, given that vesicular release probability (p<sub>v</sub>) is already close to 1 at the synapse of our interest, we believe that the observed effect is more likely attributed to an increase in release sites occupancy (p<sub>occ</sub>), which would be reflected as an increase in miniature EPSC frequency in Ishikawa and Takahashi (2001). Given that PPR depends on p<sub>v</sub> rather than p<sub>occ</sub>, this increase in p<sub>occ</sub> would not critically change our conclusion that AMPA receptor desensitization is not responsible for the strong PPD.

      Reference

      Ishikawa, T., & Takahashi, T. (2001). Mechanisms underlying presynaptic facilitatory effect of cyclothiazide at the calyx of Held of juvenile rats. The Journal of Physiology, 533(2), 423-431.

      (6) Figure 8B. The result in Figure 8C seems important, but I couldn't figure out why behaviour was not altered during the acquisition phase summarized in Figure 8B. Perhaps this could be explained more clearly for non-experts.

      Little difference in freezing behavior during acquisition has been also observed when prelimbic persistent firing was optogenetically inhibited (Gilmartin, 2013). Not only CS (tone) but also other sensory inputs (visual and olfactory etc.) and the spatial context could be a cue predicting US (shock). Moreover, during the acquisition phase, the presence of the electric shock inherently induces a freezing response as a natural defensive behavior, which may obscure specific behavioral changes related to the associative learning process. Therefore, the freezing behavior during acquisition cannot be regarded as a sign for specific association of CS and US. Instead, on the next day, we specifically evaluated the CS-US association of the conditioned animals by measuring freezing behavior in response to CS in a distinct context. We explicitly documented little difference between WT and KD animals during the acquisition phase in the relevant paragraph (line 397).

  2. Apr 2025
    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      Summary:

      It seems as if the main point of the paper is about the new data related to rat fish although your title is describing it as extant cartilaginous fishes and you bounce around between the little skate and ratfish. So here's an opportunity for you to adjust the title to emphasize ratfish is given the fact that leader you describe how this is your significant new data contribution. Either way, the organization of the paper can be adjusted so that the reader can follow along the same order for all sections so that it's very clear for comparative purposes of new data and what they mean. My opinion is that I want to read, for each subheading in the results, about the the ratfish first because this is your most interesting novel data. Then I want to know any confirmation about morphology in little skate. And then I want to know about any gaps you fill with the cat shark. (It is ok if you keep the order of "skate, ratfish, then shark, but I think it undersells the new data).

      The main points of the paper are 1) to define terms for chondrichthyan skeletal features in order to unify research questions in the field, and 2) add novel data on how these features might be distributed among chondrichthyan clades. However, we agree with the reviewer that many readers might be more interested in the ratfish data, so we have adjusted the order of presentation to emphasize ratfish throughout the manuscript.

      Strengths:

      The imagery and new data availability for ratfish are valuable and may help to determine new phylogenetically informative characters for understanding the evolution of cartilaginous fishes. You also allude to the fossil record.

      Thank you for the nice feedback.

      Opportunities:

      I am concerned about the statement of ratfish paedomorphism because stage 32 and 33 were not statistically significantly different from one another (figure and prior sentences). So, these ratfish TMDs overlap the range of both 32 and 33. I think you need more specimens and stages to state this definitely based on TMD. What else leads you to think these are paedomorphic? Right now they are different, but it's unclear why. You need more outgroups.

      Sorry, but we had reported that the TMD of centra from little skate did significantly increase between stage 32 and 33. Supporting our argument that ratfish had features of little skate embryos, TMD of adult ratfish centra was significantly lower than TMD of adult skate centra (Fig1). Also, it was significantly higher than stage 33 skate centra, but it was statistically indistinguishable from that of stage 33 and juvenile stages of skate centra. While we do agree that more samples from these and additional groups would bolster these data, we feel they are sufficiently powered to support our conclusions for this current paper.

      Your headings for the results subsection and figures are nice snapshots of your interpretations of the results and I think they would be better repurposed in your abstract, which needs more depth.

      We have included more data summarized in results sub-heading in the abstract as suggested (lines 32-37).

      Historical literature is more abundant than what you've listed. Your first sentence describes a long fascination and only goes back to 1990. But there are authors that have had this fascination for centuries and so I think you'll benefit from looking back. Especially because several of them have looked into histology and development of these fishes.

      I agree that in the past 15 years or so a lot more work has been done because it can be done using newer technologies and I don't think your list is exhaustive. You need to expand this list and history which will help with your ultimate comparative analysis without you needed to sample too many new data yourself.

      We have added additional recent and older references: Kölliker, 1860; Daniel, 1934; Wurmbach, 1932; Liem, 2001; Arratia et al., 2001.

      I'd like to see modifications to figure 7 so that you can add more continuity between the characters, illustrated in figure 7 and the body of the text.

      We address a similar comment from this reviewer in more detail below, hoping that any concerns about continuity have been addressed with inclusion of a summary of proposed characters in a new Table 1, re-writing of the Discussion, and modified Fig7 and re-written Fig7 legend.

      Generally Holocephalans are the outgroup to elasmobranchs - right now they are presented as sister taxa with no ability to indicate derivation. Why isn't the catshark included in this diagram?

      While a little unclear exactly what was requested, we restructured the branches to indicate that holocephalans diverged earlier from the ancestors that led to elasmobranchs. Also in response to this comment, we added catshark (S. canicula) and little skate (L. erinacea) specifically to the character matrix.

      In the last paragraph of the introduction, you say that "the data argue" and I admit, I am confused. Whose data? Is this a prediction or results or summary of other people's work? Either way, could be clarified to emphasize the contribution you are about to present.

      Sorry for this lack of clarity, and we have changed the wording in this revision to hopefully avoid this misunderstanding.

      Reviewer #1 (Recommendations For The Authors):

      Further Strengths and Opportunities:

      Your headings for the results subsection and figures are nice snapshots of your interpretations of the results and I think they would be better repurposed in your abstract, which needs more depth. It's a little unusual to try and state an interpretation of results as the heading title in a results section and the figures so it feels out of place. You could also use the headings as the last statement of each section, after you've presented the results. In order I would change these results subheadings to:

      Tissue Mineral Density (TMD)

      Tissue Properties of Neural Arches

      Trabecular mineralization

      Cap zone and Body zone Mineralization Patterns

      Areolar mineralization

      Developmental Variation

      Sorry, but we feel that summary Results sub-headings are the best way to effectively communicate to readers the story that the data tell, and this style has been consistently used in our previous publications. No changes were made.

      You allude to the fossil record and that is great. That said historical literature is more abundant than what you've listed. Your first sentence describes a long fascination and only goes back to 1990. But there are authors that have had this fascination for centuries and so I think you'll benefit from looking back. Especially because several of them have looked into histology of these fishes. You even have one sentence citing Coates et al. 2018, Frey et al., 2019 and ørvig 1951 to talk about the potential that fossils displayed trabecular mineralization. That feels like you are burying the lead and may have actually been part of the story for where you came up with your hypothesis in the beginning... or the next step in future research. I feel like this is really worth spending some more time on in the intro and/or the discussion.

      We’ve added older REFs as pointed out above. Regarding fossil evidence for trabecular mineralization, no, those studies did not lead to our research question. But after we discovered how widespread trabecular mineralization was in extant samples, we consulted these papers, which did not focus on the mineralization patterns per se, but certainly led us to emphasize how those patterns fit in the context of chondrichthyan evolution, which is how we discussed them.

      I agree that in the past 15 years or so a lot more work has been done because it can be done using newer technologies. That said there's a lot more work by Mason Dean's lab starting in 2010 that you should take a look at related to tesserae structure... they're looking at additional taxa than what you did as well. It will be valuable for you to be able to make any sort of phylogenetic inference as part of your discussion and enhance the info your present in figure 7. Go further back in time... For example:

      de Beer, G. R. 1932. On the skeleton of the hyoid arch in rays and skates. Quarterly Journal of Microscopical Science. 75: 307-319, pls. 19-21.

      de Beer, G. R. 1937. The Development of the Vertebrate Skull. The University Press, Oxford.

      Indeed, we have read all of Mason’s work, citing 9 of his papers, and where possible, we have incorporated their data on different species into our Discussion and Fig7. Thanks for the de Beer REFs. While they contain histology of developing chondrichthyan elements, they appear to refer principally to gross anatomical features, so were not included in our Intro/Discussion.

      Most sections within the results, read more like a discussion than a presentation of the new data and you jump directly into using an argument of those data too early. Go back in and remove the references or save those paragraphs for the discussion section. Particularly because this journal has you skip the method section until the end, I think it's important to set up this section with a little bit more brevity and conciseness. For instance, in the first section about tissue mineral density, change that subheading to just say tissue mineral density. Then you can go into the presentation of what you see in the ratfish, and then what you see in the little skate, and then that's it. You save the discussion about what other elasmobranch's or mineralizing their neural arches, etc. for another section.

      We dramatically reduced background-style writing and citations in each Results section (other than the first section of minor points about general features of the ratfish, compared to catshark and little skate), keeping only a few to briefly remind the general reader of the context of these skeletal features.

      I like that your first sentence in the paragraph is describing why you are doing. a particular method and comparison because it shows me (the reader) where you're sampling from. Something else is that maybe as part of the first figure rather than having just each with the graph have a small sketch for little skate and catch shark to show where you sampled from for comparative purposes. That would relate back, then to clarifying other figures as well.

      Done (also adding a phylogenetic tree).

      Second instance is your section on trabecular mineralization. This has so many references in it. It does not read like results at all. It looks like a discussion. However, the trabecular mineralization is one of the most interesting aspect of this paper, and how you are describing it as a unique feature. I really just want a very clear description of what the definition of this trabecular mineralization is going to be.

      In addition to adding Table 1 to define each proposed endoskeletal character state, we have changed the structure of this section and hope it better communicates our novel trabecular mineralization results. We also moved the topic of trabecular mineralization to the first detailed Discussion point (lines 347-363) to better emphasize this specific topic.

      Carry this reformatting through for all subsections of the results.

      As mentioned above, we significantly reduced background-style writing and citations in each Results section.

      I'd like to see modifications to figure 7 so that you can add more continuity between the characters, illustrated in figure 7 and the body of the text. I think you can give the characters a number so that you can actually refer to them in each subsection of the results. They can even be numbered sequentially so that they are presented in a standard character matrix format, that future researchers can add directly to their own character matrices. You could actually turn it into a separate table so it doesn't taking up that entire space of the figure, because there need to be additional taxa referred to on the diagram. Namely, you don't have any out groups in figure 7 so it's hard to describe any state specifically as ancestral and wor derived. Generally Holocephalans are the outgroup to elasmobranchs - right now they are presented as sister taxa with no ability to indicate derivation. Why isn't the catshark included in this diagram?

      The character matrix is a fantastic idea, and we should have included it in the first place! We created Table 1 summarizing the traits and terminology at the end of the Introduction, also adding the character matrix in Fig7 as suggested, including specific fossil and extant species. For the Fig7 branching and catshark inclusion, please see above.

      You can repurpose the figure captions as narrative body text. Use less narrative in the figure captions. These are your results actually, so move that text to the results section as a way to truncate and get to the point faster.

      By figure captions, we assume the reviewer refers to figure legends. We like to explain figures to some degree of sufficiency in the legends, since some people do not read the main text and simply skim a manuscript’s abstract, figures, and figure legends. That said, we did reduce the wording, as requested.

      More specific comments about semantics are listed here:

      The abstract starts negative and doesn't state a question although one is referenced. Potential revision - "Comprehensive examination of mineralized endoskeletal tissues warranted further exploration to understand the diversity of chondrichthyans... Evidence suggests for instance that trabecular structures are not common, however, this may be due to sampling (bring up fossil record.) We expand our understanding by characterizing the skate, cat shark, and ratfish... (Then add your current headings of the results section to the abstract, because those are the relevant takeaways.)"

      We re-wrote much of the abstract, hoping that the points come across more effectively. For example, we started with “Specific character traits of mineralized endoskeletal tissues need to be clearly defined and comprehensively examined among extant chondrichthyans (elasmobranchs, such as sharks and skates, and holocephalans, such as chimaeras) to understand their evolution”. We also stated an objective for the experiments presented in the paper: “To clarify the distribution of specific endoskeletal features among extant chondrichthyans”.

      In the last paragraph of the introduction, you say that "the data argue" and I admit, I am confused. Whose data? Is this a prediction or results or summary of other people's work? Either way, could be clarified to emphasize the contribution you are about to present.

      Sorry for this lack of clarity, and we have changed the wording in this revision to hopefully avoid this misunderstanding.

      In the second paragraph of the TMD section, you mention the synarcual comparison. I'm not sure I follow. These are results, not methods. Tell me what you are comparing directly. The non-centrum part of the synarcual separate from the centrum? They both have both parts... did you mean the comparison of those both to the cat shark? Just be specific about which taxon, which region, and which density. No need to go into reasons why you chose those regions here.. Put into methods and discussion for interpretation.

      We hope that we have now clarified wording of that section.

      Label the spokes somehow either in caption or on figure direction. I think I see it as part of figure 4E, I, and J, but maybe I'm misinterpreting.

      Based upon histological features (e.g., regions of very low cellularity with Trichrome unstained matrix) and hypermineralization, spokes in Fig4 are labelled with * and segmented in blue. We detailed how spokes were identified in main text (lines 241-243; 252-254) and figure legend (lines 597-603).

      Reviewer #2 (Public Review):

      General comment:

      This is a very valuable and unique comparative study. An excellent combination of scanning and histological data from three different species is presented. Obtaining the material for such a comparative study is never trivial. The study presents new data and thus provides the basis for an in-depth discussion about chondrichthyan mineralised skeletal tissues.

      Many thanks for the kind words

      I have, however, some comments. Some information is lacking and should be added to the manuscript text. I also suggest changes in the result and the discussion section of the manuscript.

      Introduction:

      The reader gets the impression almost no research on chondrichthyan skeletal tissues was done before the 2010 ("last 15 years", L45). I suggest to correct that and to cite also previous studies on chondrichthyan skeletal tissues, this includes studies from before 1900.

      We have added additional older references, as detailed above.

      Material and Methods:

      Please complete L473-492: Three different Micro-CT scanners were used for three different species? ScyScan 117 for the skate samples. Catshark different scanner, please provide full details. Chimera Scncrotron Scan? Please provide full details for all scanning protocols.

      We clarified exact scanners and settings for each micro-CT experiment in the Methods (lines 476-497).

      TMD is established in the same way in all three scanners? Actually not possible. Or, all specimens were scanned with the same scanner to establish TMD? If so please provide the protocol.

      Indeed, the same scanner was used for TMD comparisons, and we included exact details on how TMD was established and compared with internal controls in the Methods. (lines 486-488)

      Please complete L494 ff: Tissue embedding medium and embedding protocol is missing. Specimens have been decalcified, if yes how? Have specimens been sectioned non-decalcified or decalcified?

      Please complete L506 ff: Tissue embedding medium and embedding protocol is missing. Description of controls are missing.

      Methods were updated to include these details (lines 500-503).

      Results:

      L147: It is valuable and interesting to compare the degree of mineralisation in individuals from the three different species. It appears, however, not possible to provide numerical data for Tissue Mineral Density (TMD). First requirement, all specimens must be scanned with the same scanner and the same calibration values. This in not stated in the M&M section. But even if this was the case, all specimens derive from different sample locations and have, been preserved differently. Type of fixation, extension of fixation time in formalin, frozen, unfrozen, conditions of sample storage, age of the samples, and many more parameters, all influence TMD values. Likewise the relative age of the animals (adult is not the same as adult) influences TMD. One must assume different sampling and storage conditions and different types of progression into adulthood. Thus, the observation of different degrees of mineralisation is very interesting but I suggest not to link this observation to numerical values.

      These are very good points, but for the following reasons we feel that they were not sufficiently relevant to our study, so the quantitative data for TMD remain scientifically valid and critical for the field moving forward. Critically, 1) all of the samples used for TMD calculations underwent the same fixation protocols, and 2) most importantly, all samples for TMD were scanned on the same micro-CT scanner using the same calibration phantoms for each scanning session. Finally, while the exact age of each adult was not specified, we note for Fig1 that clear statistically significant differences in TMD were observed among various skeletal elements from ratfish, shark, and skate. Indeed, ratfish TMD was considerably lower than TMD reported for a variety of fishes and tetrapods (summarized in our paper about icefish skeletons, who actually have similar TMD to ratfish: https://doi.org/10.1111/joa.13537).

      In response, however, we added a caveat to the paper’s Methods (lines 466-469), stating that adult ratfish were frozen within 1 or 2 hours of collection from the wild, staying frozen for several years prior to thawing and immediate fixation.

      Parts of the results are mixed with discussion. Sometimes, a result chapter also needs a few references but this result chapter is full of references.

      As mentioned above, we reduced background-style writing and citations in each Results section.

      Based on different protocols, the staining characteristics of the tissue are analysed. This is very good and provides valuable additional data. The authors should inform the not only about the staining (positive of negative) abut also about the histochemical characters of the staining. L218: "fast green positive" means what? L234: "marked by Trichrome acid fuchsin" means what? And so on, see also L237, L289, L291

      We included more details throughout the Results upon each dye’s first description on what is generally reflected by the specific dyes of the staining protocols. (lines 178, 180, 184, 223, 227, and 243-244)

      Discussion

      Please completely remove figure 7, please adjust and severely downsize the discussion related to figure 7. It is very interesting and valuable to compare three species from three different groups of elasmobranchs. Results of this comparison also validate an interesting discussion about possible phylogenetic aspects. This is, however, not the basis for claims about the skeletal tissue organisation of all extinct and extant members of the groups to which the three species belong. The discussion refers to "selected representatives" (L364), but how representative are the selected species? Can there be a extant species that represents the entire large group, all sharks, rays or chimeras? Are the three selected species basal representatives with a generalist life style?

      These are good points, and yes, we certainly appreciate that the limited sampling in our data might lead to faulty general conclusions about these clades. In fact, we stated this limitation clearly in the Introduction (lines 126-128), and we removed “representative” from this revision. We also replaced general reference to chondrichthyans in the Title by listing the specific species sampled. However, in the Discussion, we also compare our data with previously published additional species evaluated with similar assays, which confirms the trend that we are concluding. We look forward to future papers specifically testing the hypotheses generated by our conclusions in this paper, which serves as a benchmark for identifying shared and derived features of the chondrichthyan endoskeleton.

      Please completely remove the discussion about paedomorphosis in chimeras (already in the result section). This discussion is based on a wrong idea about the definition of paedomorphosis. Paedomorphosis can occur in members of the same group. Humans have paedormorphic characters within the primates, Ambystoma mexicanum is paedormorphic within the urodeals. Paedomorphosis does not extend to members of different vertebrate branches. That elasmobranchs have a developmental stage that resembles chimera vertebra mineralisation does not define chimera vertebra centra as paedomorphic. Teleost have a herocercal caudal fin anlage during development, that does not mean the heterocercal fins in sturgeons or elasmobranchs are paedomorphic characters.

      We agree with the reviewer that discussion of paedomorphosis should apply to members of the same group. In our paper, we are examining paedomorphosis in a holocephalan, relative to elasmobranch fishes in the same group (Chrondrichthyes), so this is an appropriate application of paedomorphosis. In response to this comment, we clarified that our statement of paedomorphosis in ratfish was made with respect to elasmobranchs (lines 37-39; 418-420).

      L432-435: In times of Gadow & Abott (1895) science had completely wrong ideas bout the phylogenic position of chondrichthyans within the gnathostomes. It is curious that Gadow & Abott (1895) are being cited in support of the paedomorphosis claim.

      If paedomorphosis is being examined within Chondrichthyes, such as in our paper and in the Gadow and Abbott paper, then it is an appropriate reference, even if Gadow and Abbott (and many others) got the relative position of Chondrichthyes among other vertebrates incorrect.

      The SCPP part of the discussion is unrelated to the data obtained by this study. Kawaki & WEISS (2003) describe a gene family (called SCPP) that control Ca-binding extracellular phosphoproteins in enamel, in bone and dentine, in saliva and in milk. It evolved by gene duplication and differentiation. They date it back to a first enamel matrix protein in conodonts (Reif 2006). Conodonts, a group of enigmatic invertebrates have mineralised structures but these structure are neither bone nor mineralised cartilage. Cat fish (6 % of all vertebrate species) on the other hand, have bone but do not have SCPP genes (Lui et al. 206). Other calcium binding proteins, such as osteocalcin, were initially believed to be required for mineralisation. It turned out that osteocalcin is rather a mineralisation inhibitor, at best it regulates the arrangement collagen fiber bundles. The osteocalcin -/- mouse has fully mineralised bone. As the function of the SCPP gene product for bone formation is unknown, there is no need to discuss SCPP genes. It would perhaps be better to finish the manuscript with summery that focuses on the subject and the methodology of this nice study.

      We completely agree with the reviewer that many papers claim to associate the functions of SCPP genes with bone formation, or even mineralization generally. The Science paper with the elephant shark genome made it very popular to associate SCPP genes with bone formation, but we feel that this was a false comparison (for many reasons)! In response to the reviewer’s comments, however, we removed the SCPP discussion points, moving the previous general sentence about the genetic basis for reduced skeletal mineralization to the end of the previous paragraph (lines 435-439). We also added another brief Discussion paragraph afterwards, ending as suggested with a summary of our proposed shared and derived chondrichthyan endoskeletal traits (lines 440-453).

      Reviewer #2 (Recommendations For The Authors):

      Other comments

      L40: remove paedomorphism

      No change; see above

      L53: down tune languish, remove "severely" and "major"

      Done (lines 57-59)

      L86: provide species and endoskeletal elements that are mineralized

      No change; this paragraph was written generally, because the papers cited looked at cap zones of many different skeletal elements and neural arches in many different species

      L130: remove TMD, replace by relative, descriptive, values

      No change; see above

      L135: What are "segmented vertebral neural arches and centra" ?

      Changed to “neural arches and centra of segmented vertebrae” (lines 140-141)

      L166: L168 "compact" vs. "irregular". Partial mineralisation is not necessarily irregular.

      Thanks for pointing out this issue; we changed wording, instead contrasting “non-continuous” and “continuous” mineralization patterns (lines 171-174)

      L192: "several endoskeletal regions". Provide all regions

      All regions provided (lines 198-199)

      L269: "has never been carefully characterized in chimeras". Carefully means what? Here, also only one chimera is analyses, not several species.

      Sentence removed

      302: Can't believe there is no better citation for elasmobranch vertebral centra development than Gadow and Abott (1895)

      Added Arriata and Kolliker REFs here (lines 293-295)

      L318 ff: remove discussion from result chapter

      References to paedomorphism were removed from this Results section

      L342: refer to the species studied, not to the entire group.

      Sorry, the line numbering for the reviewer and our original manuscript have been a little off for some reason, and we were unclear exactly to which line of text this comment referred. Generally in this revision, however, we have tried to restrict our direct analyses to the species analyzed, but in the Discussion we do extrapolate a bit from our data when considering relevant published papers of other species.

      346: "selected representative". Selection criteria are missing

      “selected representative” removed

      L348: down tune, remove "critical"

      Done

      L351: down tune, remove "critical"

      Done

      L 364: "Since stem chondrichthyans did not typically mineralize their centra". Means there are fossil stem chondrichthyans with full mineralised centra?

      Re-worded to “Stem chondrichthyans did not appear to mineralize their centra” (lines 379)

      L379: down tune and change to: "we propose the term "non-tesseral trabecular mineralization. Possibly a plesiomorphic (ancestral) character of chondrichthyans"

      No change; sorry, but we feel this character state needs to be emphasized as we wrote in this paper, so that its evolutionary relationship to other chondrichthyan endoskeletal features, such as tesserae, can be clarified.

      L407: suggests so far palaeontologist have not been "careful" enough?

      Apologies; sentence re-worded, emphasizing that synchrotron imaging might increase details of these descriptions (lines 406-408)

      414: down tune, remove "we propose". Replace by "possibly" or "it can be discussed if"

      Sentence re-worded and “we propose” removed (lines 412-415)

      L420: remove paragraph

      No action; see above

      L436: remove paragraph

      No action; see above

      L450: perhaps add summery of the discussion. A summery that focuses on the subject and the methodology of this nice study.

      Yes, in response to the reviewer’s comment, we finished the discussion with a summary of the current study. (lines 440-453)

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The manuscript investigates the role of the membrane-deforming cytoskeletal regulator protein Abba in cortical development and its potential implications for microcephaly. It is a valuable contribution to the understanding of Abba's role in cortical development. The strengths and weaknesses identified in the manuscript are outlined below:

      Clinical Relevance:

      The authors identified a patient with microcephaly and intellectual disability patient harboring a mutation in the Abba variant (R671W), adding a clinically relevant dimension to the study.

      Mechanistic Insights:

      The study offers valuable mechanistic insights into the development of microcephaly by elucidating the role of Abba in radial glial cell proliferation, radial fiber organization, and the migration of neuronal progenitors. The identification of Abba's involvement in the cleavage furrow during cell division, along with its interaction with Nedd9 and positive influence on RhoA activity, adds depth to our understanding of the molecular processes governing cortical development.

      In Vivo Validation:

      The overexpression of mutant Abba protein (R671W), which results in phenotypic similarities to Abba knockdown effects, supports the significance of Abba in cortical development.

      Weaknesses:

      The findings in the study suggest that heterozygous expression of the R671W variant may exert a dominant-negative effect on ABBA's role, disrupting normal brain development and leading to microcephaly and cognitive delay. However, evidence also points to a possible gain-of-function effect, as the mutation does not decrease RhoA activity or PH3 expression in vivo. Additionally, the impact of ABBA depletion on cell fate is not fully addressed. While abnormal progenitor accumulation in the ventricular and subventricular zones is observed, the transition of progenitors to neuroblasts and their ability to support neuroblast migration remains unclear. Impaired cleavage furrow ingression and disrupted Nedd9 and RhoA signaling could lead to structural abnormalities in radial glial progenitors, affecting their scaffold function and neuroblast progression.  The manuscript lacks an exploration of the loss or decrease in interaction between Abba and NEDD9 in the case of the pathogenic patient-derived mutation in Abba. Furthermore, addressing the changes in localization and ineraction in for NEDD9 following over-expression of the mutant are important to further mehcanistically characterizxe this interaction in future studies. These gaps suggest the need for further exploration of ABBA's role in progenitor cell fate and neuroblast migration to clarify its mechanistic contributions to cortical development.

      (1) Response to statement on dominant-negative vs. gain-of-function effect of R671W variant:

      We appreciate the reviewer’s thoughtful analysis of the potential mechanisms underlying the R671W variant. We agree that the heterozygous expression of the human R671W mutation may initially suggest a dominant-negative effect. However, our data indicate that this variant may instead exert a gain-of-function effect. As highlighted in the discussion section, overexpression of ABBA-R671W in cells that also express wild-type ABBA did not result in a dominant-negative decrease in RhoA activation nor affect PH3 expression in vivo. These findings suggest that the R671W mutation does not impair the canonical ABBA-mediated activation of RhoA, and instead, the resulting phenotype may involve post-mitotic processes, such as altered cell migration. This interpretation is further supported by previous clinical studies reporting additional patients with the same mutation and phenotypic outcomes.

      (2) Response to statement on ABBA depletion and progenitor-to-neuroblast transition:

      We agree that the question of how ABBA depletion affects cell fate and the progression of radial glial progenitors (RGPs) to neuroblasts is of significant importance. Our findings suggest that ABBA knockdown disrupts cleavage furrow ingression, which may block radial glial cells prior to abscission. This likely contributes to the observed accumulation of cells in the ventricular and subventricular zones, as seen in Figures 2A and 4D. Additionally, disrupted Nedd9 expression and impaired RhoA signaling appear to alter the structural integrity of RGPs, leading to detachment of apical and basal endfeet (Supplementary Figure 3). These structural abnormalities compromise the ability of RGPs to function as scaffolds for neuroblast migration. Although direct live imaging of neuroblast migration was beyond the scope of the current dataset, we believe our evidence strongly supports a model in which ABBA depletion disrupts progenitor structure and migration. Future studies will address these transitions more directly using live imaging and fate-mapping strategies

      (3) Response to statement on loss of interaction between ABBA and NEDD9 with the R671W mutation:

      We fully agree with the importance of investigating whether the R671W mutation alters ABBA’s interaction with NEDD9. While our study provides evidence for a role of NEDD9 in mediating ABBA function, we acknowledge that we did not directly test whether the R671W mutation disrupts this interaction. We apologize if our manuscript conveyed the impression that this point had been fully addressed. Due to technical limitations, particularly the poor performance of anti-NEDD9 antibodies in slice immunohistochemistry, we were unable to reliably assess the interaction or localization changes in vivo. Nevertheless, this remains a priority for future studies aimed at better understanding the mechanistic underpinnings of the R671W mutation.

      (4) Response to statement on future directions for mechanistic characterization of NEDD9 localization and interaction:

      We agree with the reviewer that further investigation into NEDD9 localization and its interaction with the ABBA R671W mutant is essential to better define the molecular consequences of this mutation. Unfortunately, as mentioned above, the current tools available to us did not permit reliable immunohistochemical detection of NEDD9 in tissue. We fully intend to pursue alternative approaches, such as tagging strategies or the use of more sensitive detection platforms, to determine whether the R671W mutation affects the subcellular localization or stability of the ABBA-NEDD9 interaction. These experiments will be critical to elucidate the pathway through which ABBA regulates progenitor cell behavior and cortical development.

      Reviewer #2 (Public review):

      Summary:

      Carabalona and colleagues investigated the role of the membrane-deforming cytoskeletal regulator protein Abba (MTSS1L/MTSS2) in cortical development to better understand the mechanisms of abnormal neural stem cell mitosis. The authors used short hairpin RNA targeting Abba20 with a fluorescent reporter coupled with in utero electroporation of E14 mice to show changes to neural progenitors. They performed flow cytometry for in-depth cell cycle analysis of Abba-shRNA impact to neural progenitors and determined an accumulation in S phase. Using culture rat glioma cells and live imaging from cortical organotypic slides from mice in utero electroporated with Abba-shRNA, the authors found Abba played a prominent role in cytokinesis. They then used a yeast-two-hybrid screen to identify three high confidence interactors: Beta-Trcp2, Nedd9, and Otx2. They used immunoprecipitation experiments from E18 cortical tissue coupled with C6 cells to show Abba requirement for Nedd9 localization to the cleavage furrow/cytokinetic bridge. The authors performed an shRNA knockdown of Nedd9 by in utero electroporation of E14 mice and observed similar results as with the Abba-shRNA. They tested a human variant of Abba using in utero electroporation of cDNA and found disorganized radial glial fibers and misplaced, multipolar neurons, but lacked the impact of cell division seen in the shRNA-Abba model.

      Strengths:

      Fundamental question in biology about the mechanics of neural stem cell division.

      Directly connecting effects in Abba protein to downstream regulation of RhoA via Nedd9.

      Incorporation of human mutation in ABBA gene.

      Use of novel technologies in neurodevelopment and imaging.

      Weaknesses:

      Unexplored components of the pathway (such as what neurogenic populations are impacted by Abba mutation) and unleveraged aspects of their data (such as the live imaging) limit the scope of their findings and left significant questions about the effect of ABBA on radial glia development.

      (1) Claim of disorganized radial glial fibers lacks quantifications.

      - On page 11, the authors claim that knockdown of Abba lead to changes in radial glial morphology observed with vimentin staining. Here they claim misoriented apical processes, detached end feet, and decreased number of RGP cells in the VZ. However, they no not provide quantification of process orientation to better support their first claim. Measurements of radial glia fiber morphology (directionality, length) and of angle of division would be metrics that can be applied to data. Some of these analysis could be done in their time-lapse microscopy images, such as to quantify the number of cell division during their period of analysis (though that is short-15 hours).

      Response to: Lack of quantification of disorganized radial glial fibers and cell divisions in time-lapse data

      We appreciate the reviewer’s insightful comment regarding the need for quantification of radial glial (RG) fiber morphology. In the revised manuscript, we have addressed this by providing new quantification of changes in vimentin staining, specifically measuring the dispersion of the signal as a proxy for fiber disorganization (see Supplementary Figure 1). These data support the observed morphological changes, including misoriented apical processes and detachment of endfeet, following Abba knockdown.

      Regarding time-lapse analysis to track cell divisions, we attempted to follow individual cells during the 15-hour imaging window. However, due to the relatively short duration and limited number of cells that could be reliably tracked, the dataset did not allow for statistically meaningful conclusions. As an alternative approach, we performed live-cell imaging using Anillin-GFP, a reliable marker of mitotic progression. The distribution and accumulation of Anillin-GFP were analyzed in ABBA-shRNA3 and control conditions, and the results (now included in Supplementary Figure 3) indicate an increased number of cells arrested in late mitosis upon ABBA knockdown. This supports the notion of disrupted cytokinesis as a consequence of Abba depletion.

      (2) Unclear where effect is:

      - In RG or neuroblasts? Is it in cell cleavage that results in accumulation of cells at VZ (as sometimes indicated by their data like in Fig 2A or 4D)? Interrogation of cell death (such as by cleaved caspase 3) would also help. Given their time lapse, can they identify what is happening to the RG fiber? The authors describe a change in "migration" but do not show evidence for this for either progenitor or neuroblast populations. Given they have nice time-lapse imaging data, could they visualize progenitor versus young neuron migration? Analysis of neuroblasts (such as with doublecortin expression in the tissue) would also help understand any issues in migration (of neurons v stem cells).

      - At cleaveage furrow? In abscission? There is high resolution data that highlights the cleavage furrow as the location of interest (fig 3A), however there is also data (fig 3B) to suggest Abba is expressed elsewhere as well and there is an overall soma decrease. More detail of the localization of Abba during the division process would be helpful-for example, could cleavage furrow proteins, such as Aurora B, co-localization (and potentially co-IP) help delineate subpopulations of Abba protein? Furthermore, the FRET imaging is unique way to connect their mutation with function-could they measure/quantify differences at furrow compared to rest of soma to further corroborate that Abba-associated RhoA effect was furrow-enriched?

      - The data highlights nicely that a furrow doesn't clearly form when ABBA expression and subsequent RhoA activity are decreased (in Fig 3 or 5A). Does this lead to cells that can't divide because of poor abscission, especially since "rounding" still occurs? Or abnormal progenitors (with loss of fiber or inability to support neuroblast migration)? Or abnormal progression of progenitors to neuroblasts?

      Response to: Unclear location of the effect (RG vs. neuroblasts; cleavage furrow/abscission; migration issues)

      We thank the reviewer for this comprehensive and thought-provoking set of questions.

      a) Site of the effect – Radial Glia vs. Neuroblasts:

      Our data suggest that the primary effect of ABBA depletion occurs in radial glial progenitors (RGPs), specifically prior to abscission. We observed accumulation of electroporated cells in the ventricular zone (VZ), which we interpret as a result of cytokinetic failure (e.g., Figure 2A, 4D). We also documented detachment of apical and basal endfeet (see Supplementary Figure 3), further supporting structural disruption of RG fibers.

      b) Cell death analysis:

      We considered using cleaved caspase-3 as a marker for apoptosis, but due to its transient and non-specific activation during development, we opted to assess overall survival via quantification of RGP cell numbers and localization. This approach better reflects the developmental impact of ABBA knockdown (Supplementary Figure 3).

      c) Migration defects:

      We agree that distinguishing between progenitor and neuroblast migration would be highly informative. Although we did not perform doublecortin or similar staining to differentiate these populations in this dataset, the accumulation of electroporated cells in VZ/SVZ strongly suggests a migration deficit. Addressing this in detail will require new experiments using lineage-specific markers and longer time-lapse recordings, which we plan to explore in future studies.

      d) Cleavage furrow and abscission:

      Our high-resolution imaging of Anillin-GFP and FRET-based RhoA activity shows that ABBA localizes predominantly at the cleavage furrow. New quantifications of RhoA activity (now in Figure 5) show that the reduction in signaling is most pronounced at the furrow in ABBA knockdown cells. These findings align with the hypothesis that ABBA, through Nedd9 and RhoA, is essential for proper furrow formation and abscission.

      e) Mechanistic implications:

      As the reviewer notes, ABBA knockdown leads to cells that "round" but do not complete division, likely due to poor cleavage furrow ingression. This could generate abnormal progenitors that are structurally compromised (detached fibers) and thus unable to support neuroblast migration or proper differentiation. The cumulative result is disrupted progression from RGPs to neuroblasts, impaired structural scaffolding, and possibly reduced cell viability.

      (3) Limited to a singular time point of mouse cortical development

      On page 13, the authors outline the results of their Y2H screen with the identification of three high confidence interactors. Notably, they used a E10.5-E12.5 mouse brain embryo library rather than one that includes E14, the age of their in utero electroporation mice. Many of the authors' claims focus on in utero electroporation of shRNA-Abba of E14 mice that are then evaluated at E16-18. Justification for the focus on this age range should be included to support that their findings can then be applied to all of mouse corticogenesis.

      Response to: Use of E10.5–E12.5 library for yeast-two-hybrid (Y2H) screen

      We appreciate the reviewer’s concern regarding the developmental stage of the Y2H library. We chose the E10.5–E12.5 brain embryo library based on prior work demonstrating that ABBA expression is strongest during early cortical development, particularly in radial glia at these stages (see Saarikangas et al., J Cell Sci 2008). The radial glia-specific expression of ABBA was previously validated using RC2 and Tuj1 markers at E12.5. Thus, the library we used is well-suited for identifying interactors relevant to radial glial function, including Nedd9. We have clarified this rationale in the revised manuscript.

      (4) Detail of the effect of the human variant of the ABBA mutation in mouse is lacking.

      Their identification of the R671W mutation is interesting and the IUE model warrants more characterization, as they did with their original KD experiments.

      - Could they show that Abba protein levels are decreased (in either cell lines or electroporated tissue)?

      - While time-lapse morphology might not have been performed, more analysis on cell division phenotype (such as plane of division and radial glia morphology) would be helpful.

      Response to: Lack of detail on R671W human variant effects

      We thank the reviewer for encouraging further characterization of the R671W variant. In the revised manuscript, we now provide additional data on interkinetic nuclear migration (INM) defects resulting from R671W overexpression (see Supplementary Figure 3). These changes are consistent with disrupted radial glial organization and mirror aspects of the ABBA knockdown phenotype.

      a) Protein levels:

      We quantified ABBA expression in cells overexpressing the R671W variant (Supplementary Figure 5) and found no significant reduction compared to wild-type. This argues against a loss-of-function mechanism and supports a gain-of-function or dominant-interfering effect.

      b) Morphological and division phenotyping:

      While time-lapse imaging of R671W-expressing cells was not available in our dataset, we acknowledge that analyses such as division angle or radial glial morphology would be informative. Unfortunately, we were unable to perform these with the current data, but we agree these are important goals for future work.

      Reviewer 2 conclusion:

      The resubmission has addressed many of the questions raised.

      I have a few comments that should be addressed:

      (1) The authors maintain a deficit in "migration of immature neurons" which remains unsubstantiated. In their resonse, they state: "we believe that the data showing the accumulation of migrating electroporated cells in the ventricular (V) and subventricular (SV) zones provide compelling evidence of abnormal migration in ABBA-shRNA electroporated cells. "

      - Firstly, they do not demonstrate that it's immature neurons, not RGs, that are affected. Secondly, accumulation of cells at the V-SVZ could be due to soley the inability for the RGC to undergo mitosis, therefore remaining stuck"

      The commentary of migration, especially of neurons, should be modified.

      We appreciate the reviewer’s careful reading and valid concern regarding our use of the term "migration of immature neurons." We fully agree that the current dataset does not definitively distinguish whether the accumulated cells in the ventricular (V) and subventricular (SV) zones are immature neurons or radial glial progenitors (RGPs) arrested in mitosis.

      To clarify, our observations indicate that electroporated cells accumulate in the VZ/SVZ following ABBA knockdown (Figures 2A and 4D), and this was interpreted as evidence of impaired migration. However, we now recognize that this accumulation may primarily reflect a block in cell cycle progression—specifically, at the stage of cleavage furrow ingression and abscission—rather than a migratory defect per se. This is supported by our new data using Anillin-GFP (Supplementary Figure 3), which show increased accumulation of cells with persistent Anillin expression, consistent with mitotic arrest. Furthermore, the detachment of apical and basal processes (also shown in Supplementary Figure 3) suggests that ABBA knockdown affects the structural integrity of RGPs, potentially compromising their scaffold function.

      In light of these points, we have revised the manuscript to temper our conclusions regarding “migration defects.” Specifically, we now refer to the phenotype as “abnormal accumulation of progenitor cells” and clarify that, while these findings are consistent with impaired cell progression or scaffolding required for migration, we do not directly demonstrate impaired migration of immature neurons. As suggested, addressing this would require additional analyses, such as time-lapse imaging of post-mitotic cells or staining with markers like Doublecortin, which are beyond the scope of the current dataset but will be a focus of future investigations.

      We thank the reviewer again for encouraging a more precise interpretation of our findings

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Supplementary Fig 4B - The figure doesn't show an increase in percentage of PH3 positive cells in the NEDD9-shRNA condition. The control images are also missing for comparison. The figure legend needs to be corrected to match with the figure showing no significant changes.

      Thank you for this comment. This has been amended in the revised manuscript in the form of a new revised Supplementary Fig 4.

      Reviewer #2 (Recommendations for the authors):

      Minor annotations for slice culture assay

      The authors should make note of ages of slice cultures in text and have better annotations of slice cultures (for example, in Fig 4-where is mitosis?)

      We are sorry for the mistake it's not mitosis, it's the cleavage furrow stage.  In addition, a new amended Figure 4 is provided. 

      The effects are hard to see in lower mag slice images in Fig. 6. Would recommend focusing on higher mag to highlight RG differences.

      Thank you for this comment. This has been amended in the revised manuscript in the form of a new revised Figure 6.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, Xiao et al. classified retroperitoneal liposarcoma (RPLS) patients into two subgroups based on whole transcriptome sequencing of 88 patients. The G1 group was characterized by active metabolism, while the G2 group exhibited high scores in cell cycle regulation and DNA damage repair. The G2 group also displayed more aggressive molecular features and had worse clinical outcomes compared to G1. Using a machine learning model, the authors simplified the classification system, identifying LEP and PTTG1 as the key molecular markers distinguishing the two RPLS subgroups. Finally, they validated these markers in a larger cohort of 241 RPLS patients using immunohistochemistry. Overall, the manuscript is clear and well-organized, with its significance rooted in the large sample size and the development of a classification method.

      Thank you for your positive assessment of our study on classifying RPLS patients based on whole transcriptome sequencing. We appreciate your recognition of the distinct characteristics of the G1 and G2 groups, as well as the significance of our simplified classification system and the identification of LEP and PTTG1 as key molecular markers. Your acknowledgment of the clarity and organization of our manuscript, along with the importance of the large sample size, is greatly appreciated. We will continue to refine our work based on your feedback as we prepare for resubmission.

      Weakness:

      (1) While the authors suggest that LEP and PTTG1 serve as molecular markers for the two RPLS groups, the process through which these genes were selected remains unclear. The authors should provide a detailed explanation of the selection process.

      The selection criteria for identifying LEP and PTTG1 as biomarkers involved selecting prognostic genes that were highly expressed in C1 and C2, respectively, and achieved the highest AUC value in distinguishing the two RPLS groups (Page17 lines 288-290).

      (2) To ensure the broader applicability of LEP and PTTG1 as classification markers, the authors should validate their findings in one or two external datasets.

      We sincerely appreciate your insightful suggestion regarding the external validation of LEP and PTTG1 as classification biomarkers. To address this concern, we performed an independent validation using an external liposarcoma cohort (GSE30929; Page 6, Lines 104-105)), which comprises 140 primary liposarcoma samples with annotated clinicopathological and survival data. This dataset was selected due to its relevance to RPLS (N=63, 45%) and the availability of distant recurrence-free survival (DRFS) outcomes, aligning with the clinical focus of our study. 

      Applying our previously established prognostic model (Risk value = 2.182 × PTTG1 - 2.204 × LEP) to this cohort, we stratified patients into high- and low-risk groups using the median risk score as the cutoff. Consistent with our original findings, the high-risk group exhibited significantly worse DRFS compared to the low-risk group. The ROC curves based on the 1-, 3-, 5-year survival status of patients demonstrated that this model can effectively predict patient DRFS (log-rank P < 0.001, Figure S3A-B). Furthermore, the high-risk group demonstrated a higher proportion of high-grade histology (P < 0.001, Fisher’s exact test, Figure S3C-D).

      These results validate the robustness and generalizability of our risk stratification model across distinct liposarcoma cohorts. The external dataset’s alignment with our findings underscores the potential of LEP and PTTG1 as reproducible biomarkers for prognosis and therapeutic stratification in liposarcoma. We have incorporated these validation results into the revised manuscript (Page 18, Lines 305-315) to strengthen the clinical applicability of our conclusions.

      (3) Since molecular subtyping is often used to guide personalized treatment strategies, it is recommended that the authors evaluate therapeutic responses in the two distinct groups. Additionally, they should validate these predictions using cell lines or primary cells.

      We sincerely appreciate your insightful comments and suggestions regarding the evaluation of therapeutic responses and the validation of our predictions using cell lines or primary cells. We would like to address these points in detail below:

      (1) Purpose of the PTTG1- and LEP-based RPLS Classification Model

      The primary objective of our study was to develop a molecular subtyping model based on PTTG1 and LEP to guide personalized treatment strategies for patients with RPLS, particularly those classified as low-grade by traditional histopathological criteria but exhibiting poor prognosis. This subgroup of patients may benefit from more aggressive surgical resection, which is a potentially curative approach for RPLS. Our model aims to identify these high-risk patients to ensure complete tumor resection, thereby improving their clinical outcomes.

      (2) Therapeutic Response Evaluation in Distinct Groups

      In both our validation cohort and external validation cohort, surgical resection was the primary treatment modality for RPLS. After stratifying patients using our model, we observed significant differences in surgical outcomes between the two groups: the high-risk group exhibited poor prognosis, while the low-risk group showed favorable outcomes (Figure 5D-E and Figure S3A-B). Importantly, our model successfully identified low-grade histopathological cases with poor prognosis, who might otherwise be undertreated (Figure 5G-I and Figure S3C-D). By advocating for more thorough surgical resection in these high-risk patients, we aim to improve their prognosis. This achievement aligns with the primary goal of our study, which is to provide a molecular tool for personalized treatment guidance.

      (3) Future Validation and Functional Exploration of PTTG1 and LEP

      Our study has identified PTTG1 and LEP as key biomarkers for RPLS classification, and we recognize the urgent need to elucidate their molecular functions in RPLS pathogenesis. Here, we are pleased to report that we have already initiated cellular and animal experiments to investigate the roles of PTTG1 and LEP in RPLS. These experiments aim to validate our predictions and explore the underlying mechanisms by which these biomarkers contribute to tumor behavior and treatment response. We anticipate that the results of these studies will provide further mechanistic insights and will be submitted for publication in a suitable journal in the near future.

      Reviewer #2 (Public review):

      Surgical resection remains the most effective treatment for retroperitoneal liposarcoma. However, postoperative recurrence is very common and is considered the main cause of disease-related death. Considering the importance and effectiveness of precision medicine, the identification of molecular characteristics is particularly important for the prognosis assessment and individualized treatment of RPLS. In this work, the authors described the gene expression map of RPLS and illustrated an innovative strategy of molecular classification. Through the pathway enrichment of differentially expressed genes, characteristic abnormal biological processes were identified, and RPLS patients were simply categorized based on the two major abnormal biological processes. Subsequently, the classification strategy was further simplified through nonnegative matrix factorization. The authors finally narrowed the classification indicators to two characteristic molecules LEP and PTTG1, and constructed novel molecular prognosis models that presented obviously a great area under the curve. A relatively interpretable logistic regression model was selected to obtain the risk scoring formula, and its clinical relevance and prognostic evaluation efficiency were verified by immunohistochemistry. Recently, prognostic model construction has been a hot topic in the field of oncology. The interesting point of this study is that it effectively screened characteristic molecules and practically simplified the typing strategy on the basis of ensuring high matching clinical relevance. Overall, the study is well-designed and will serve as a valuable resource for RPLS research.

      Thank you for your insightful feedback on our manuscript. We appreciate your recognition of the importance of precision medicine and molecular characteristics in improving prognosis and individualized treatment for RPLS.

      We are pleased that you found our gene expression mapping and innovative molecular classification strategy valuable. Your positive remarks on our pathway enrichment analysis and the categorization of RPLS patients based on abnormal biological processes affirm our approach.

      We are also grateful for your acknowledgment of our focus on the characteristic molecules LEP and PTTG1, as well as the development of novel molecular prognosis models with significant predictive capability.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Joint Public Review:

      Summary

      In this manuscript, Dong et al. study the directed cell migration of tracheal stem cells in Drosophila pupae. The authors study how the directionality of these cells is regulated along the dorsal trunk. They show that inter-organ communication between the tracheal stem cells and the nearby fat body plays a role in posterior migration. They provide compelling evidence that Upd2 production in the fat body and JAK/STAT activation in the tracheal stem cells play a role. Moreover, they show that JAK/STAT signalling might induce the expression of apicobasal and planar cell polarity genes in the tracheal stem cells which appear to be needed to ensure unidirectional migration. Finally, the authors suggest that trafficking and vesicular transport of Upd2 from the fat body towards the tracheal cells might be important.

      Strengths

      The manuscript is well written and presents extensive and varied experimental data to show a link between Upd2-JAK/STAT signaling from the fat body and tracheal progenitor cell migration. The authors provide convincing evidence that the fat body, located near the trachea, secretes vesicles containing the Upd2 cytokine and that affecting JAK-STAT signaling results in aberrant migration of some of the tracheal stem cells towards the anterior. Using ChIP-seq as well as analysis of GFP-protein trap lines of planar cell polarity genes in combination with RNAi experiments, the authors show that STAT92E likely regulates the transcription of planar cell polarity genes and some apicobasal cell polarity genes in tracheal stem cells which appear to be needed for unidirectional migration. The work presented here provides some novel insights into the mechanism that ensures polarized migration of tracheal stem cells, preventing bidirectional migration. This might have important implications for other types of directed cell migration in invertebrates or vertebrates including cancer cell migration. Overall, the authors have substantially improved their manuscript since the first submission but there are still some weaknesses.

      Weaknesses

      Overall, the manuscript lacks insights into the potential significance of the observed phenotypes and of the proposed new signaling model. Most of our concerns could be dealt with by adjusting the text (explaining some parts better and toning down some statements).

      (1) Directional migration of tracheal progenitors is only partially compromised, with some cells migrating anteriorly and others maintaining their posterior migration, a quite discrete phenotype.

      The strongest migration defects quantified in graphs (e.g. 100 μm) are not shown in images, since they would be out of frame, it would be beneficial to see them. In addition, the consequence of defects in polarized migration on tracheal development is not clear and data showing phenotypes on the final trachea morphology in pupae are not explained nor linked to the previous phenotypes.

      We agree with you that it is informative to show strong anterior migration (> 100 μm). Accordingly, we have shown examples in Figure 3B and Figure 7R-S. In addition, we have also discuss on the links between migration defects and the consequential phenotypes of the animal at a later developmental stage in the revised manuscript. The undisciplined migration leads to insufficient regeneration and incomplete remodeling of airway and causes pupal lethality.

      (2) Some important information is lacking, such as the origin of mutant and UAS-RNAi lines, which are not reported in the material and methods. For instance, mutants for components of the JAK-STAT pathway are used but not described. Are they all viable at the pupal stage? Otherwise, pupae would not be homozygous mutants. From the figure legend, it seems that the Stat92EF allele has been used, which is a point mutation, thus not leading to an absence of protein. If the hopTUM allele has been used, as mentioned in the legend, it is a gain-of-function allele. Thus, the authors should not conclude that "The aberrant anterior migration of tracheal progenitors in the absence of JAK/STAT components led to impairment of tracheal integrity and caused melanization in the trachea (Figure 3-figure supplement 1E-I)".

      We apologize for inadequate description of the experimental materials and methods. We have listed the stock number of mutant and RNAi alleles in Key resource table and Materials. The mutant alleles that we chose to examine can survive to pupal stage, which is key to the success of our subsequent characterization of these mutants. According to your suggestion, we modified the statement for accuracy.

      (3) The authors observe that tracheal progenitors display a polarized distribution of Fat that is controlled by JAK-STAT signaling. However, this conclusion is made from a single experiment using only 3 individuals with no statistics. This is insufficient to support the claim that "JAK/STAT signaling promotes the expression of genes involved in planar cell polarity leading to asymmetric localization of Fat in progenitor cells", as mentioned in the abstract, or that "the activated tracheal progenitors establish a disciplined migration through the asymmetrical distribution of polarity proteins which is directed by an Upd2-JAK/STAT signaling stemming from the remote organ of fat body."

      We performed multiple biological replicates for Ft distribution experiments and observed similar trend, although we only showed three representative samples. In the revised text, we have included n number for statistic representation and statistic test.

      (4) The authors demonstrate that Upd2 is transported through vesicles from the fat body to the tracheal progenitors. It remains somewhat unclear in the proposed model how Upd2 activates JAK-STAT signaling. Are vesicles internalized, as it seems to be proposed, and thus how does Upd2 activate JAK-STAT signaling intracellularly? Or is Upd2 released from vesicles to bind Dome extracellularly to activate the JAK-STAT pathway? Moreover, it is not clear nor discussed what would be the advantage of transporting the ligand in vesicles compared to classical ligand diffusion.

      We do not know whether the association between Upd2 and Lbm is inside or outside vesicles. The vesicular trafficking of Upd2 is our observation and supported by various genetic and biochemical experiments. Our research does not imply the message that this vesicular trafficking has advantage over diffusion.

    1. Author response:

      The following is the authors’ response to the original reviews

      Joint Public Review:

      Idiopathic scoliosis (IS) is a common spinal deformity. Various studies have linked genes to IS, but underlying mechanisms are unclear such that we still lack understanding of the causes of IS. The current manuscript analyzes IS patient populations and identifies EPHA4 as a novel associated gene, finding three rare variants in EPHA4 from three patients (one disrupting splicing and two missense variants) as well as a large deletion (encompassing EPHA4) in a Waardenburg syndrome patient with scoliosis. EPHA4 is a member of the Eph receptor family. Drawing on data from zebrafish experiments, the authors argue that EPHA4 loss of function disrupts the central pattern generator (CPG) function necessary for motor coordination.

      The main strength of this manuscript is the human genetic data, which provides convincing evidence linking EPHA4 variants to IS. The loss of function experiments in zebrafish strongly support the conclusion that EPHA4 variants that reduce function lead to IS.

      The conclusion that disruption of CPG function causes spinal curves in the zebrafish model is not well supported. The authors' final model is that a disrupted CPG leads to asymmetric mechanical loading on the spine and, over time, the development of curves. This is a reasonable idea, but currently not strongly backed up by data in the manuscript. Potentially, the impaired larval movements simply coincide with, but do not cause, juvenile-onset scoliosis. Support for the authors' conclusion would require independent methods of disrupting CPG function and determining if this is accompanied by spine curvature. At a minimum, the language of the manuscript could be toned down, with the CPG defects put forward as a potential explanation for scoliosis in the discussion rather than as something this manuscript has "shown". An additional weakness of the manuscript is that the zebrafish genetic tools are not sufficiently validated to provide full confidence in the data and conclusions.

      We highly appreciate the reviewer’s insightful comments and the acknowledgment of the main values of our study. We agree with the reviewer that further experiments are needed to fully establish the relationship between CPG and scoliosis. In response, we have revised the conclusion in the manuscript to better reflect this. Additionally, we conducted further analyses on the mutants to provide additional evidence supporting this concept.

      Reviewer #1 (Recommendations for the authors):

      Epha4a mutant zebrafish exhibited mild spinal curves, mostly laterally and in the tail. This was 75% of homozyous mutants but also, surprisingly, about 20% of heterozygotes. epha4b mutants also developed some mild scoliosis. If the two zebrafish paralogs can compensate for each other (partial redundancy), we might expect more severe scoliosis in double mutants. Did the authors generate and analyze double mutants? I believe it would be very useful for this study to report the zebrafish phenotype of loss of both paralogs together.

      We appreciate the reviewer’s insightful comment regarding the potential value of reporting the phenotype of eph4a/eph4b double mutants. While we fully agree that this analysis would be valuable, our attempts to generate double mutants have been unsuccessful. These two genes are closely linked on the chromosome, with less than 100 kb separating them, which makes it challenging to generate double mutants through standard genetic crossing. Establishing a double mutant line would require more than a year due to the technical constraints of the process. Although we are unable to address this question directly at this time, we hypothesize that eph4a/eph4b double mutants may exhibit a higher likelihood of body axis abnormalities based on the phenotypes observed in single mutants and the known functions of these genes.

      We hope this perspective will provide some useful context despite the limitations.

      In Figure 1F, a pCDK5 western blot is performed as a readout of EPH4A signaling after either WT or C849Y mutant EPH4A is transfected into HEK 293T cells. It would be useful to mention in the text, or at least the figure legend, how this experiment was performed/where the protein samples came from. It is included in the methods, but in the main text, it simply says "we conducted western blotting" without mentioning whether the protein samples were from cell lines, patients, or another source.

      Sorry for our ignorance. A detailed description of the western blotting conduction was supplemented at both “results” part (page 8, line 187-190) and the Figure 1 legend.

      Was the relative turn angle biased to the left or right side of the fish? (i.e. is a positive angle a rightward or leftward turn?)

      We are sorry for our unclear description. In Figure 3D, positive angle means turning left, while negative angle means turning right. In wild-type larvae, the average turning angle over a 4-minute period is approximately 0, whereas in mutants, this value deviates from 0, indicating a directional preference (positive for leftward and negative for rightward turns) in swimming behavior during the recording period. We have also made the necessary supplementation in the text and figure legend.

      In Figure 4, morpholinos rather than mutants are used, but it is not clear why. Has it been established that the MO used disrupts gene function specifically? Can the effect of the MO be rescued by expressing a wild-type mRNA of Epha4a? Does MO knockdown induce spinal curves if fish are raised? Indeed, this could be a way to determine whether the spinal curves are caused by early events in development (when MOs are active).

      Thanks for the comments. The efficacy of relevant MOs has been well-documented in numerous previous studies (Addison et al., 2018; Cavodeassi et al., 2013; Letelier et al., 2018; Royet et al., 2017). Following this reviewer’s suggestion, we have raised the epha4a morphants into adults, while no scoliosis were observed, suggesting that the spinal curvature formation may be induced by long-term defects in the absence of Epha4a. Additionally, we reconfirmed the abnormal motor neuron activation frequency phenotype in the mutants background. The corresponding data have replaced the original Figure 4 in the manuscript. 

      References

      (1) Addison, M., Xu, Q., Cayuso, J., and Wilkinson, D.G. (2018). Cell Identity Switching Regulated by Retinoic Acid Signaling Maintains Homogeneous Segments in the Hindbrain. Dev Cell 45, 606-620 e603.

      (2) Cavodeassi, F., Ivanovitch, K., and Wilson, S.W. (2013). Eph/Ephrin signalling maintains eye field segregation from adjacent neural plate territories during forebrain morphogenesis. Development 140, 4193-4202.

      (3) Letelier, J., Terriente, J., Belzunce, I., Voltes, A., Undurraga, C.A., Polvillo, R., Devos, L., Tena, J.J., Maeso, I., Retaux, S., et al. (2018). Evolutionary emergence of the rac3b/rfng/sgca regulatory cluster refined mechanisms for hindbrain boundaries formation. Proc Natl Acad Sci U S A 115, E3731-E3740.

      (4) Royet, A., Broutier, L., Coissieux, M.M., Malleval, C., Gadot, N., Maillet, D., Gratadou-Hupon, L., Bernet, A., Nony, P., Treilleux, I., et al. (2017). Ephrin-B3 supports glioblastoma growth by inhibiting apoptosis induced by the dependence receptor EphA4. Oncotarget 8, 23750-23759.

      Reviewer #2 (Recommendations for the authors):

      Supplementary Table 3 is missing.

      Sorry for any inconvenience caused to the reviewers. Due to the size of the supplementary Table 3, we have separately uploaded an Excel file as supplementary materials. We have also double-checked during the resubmission process of the revised manuscript. Thanks for your thorough review.

      The authors report only a single mutant allele for zebrafish epha4a and epha4b. Additionally, they provide no information about how many generations each allele has been outcrossed. The authors should provide some type of validation that the phenotypes they describe result from loss of function of the targeted gene and not from an off-targeting event.

      Thanks for the comments. For epha4a and epha4b mutants, each homozygous mutant was initially derived from the self-crossing of first filial generation heterozygotes, and subsequent homozygous generations were maintained for fewer than three rounds of in-crossing. Interestingly, we observed a reduction in the incidence of scoliosis across successive generations. This trend may be attributed to potential genetic compensation mechanisms, which could mitigate the phenotypic severity over time. To address concerns about possible off-target effects, we synthesized and injected epha4a mRNA to test for phenotypic rescue. Our data show that epha4a mRNA injection partially restored swimming coordination in the mutants (Fig. S5). Moreover, similar motor coordination defects have been reported in Epha4-deficient mice, as documented in previous studies (Kullander et al., 2003; Borgius et al., 2014). These findings collectively strengthen the hypothesis that Epha4a plays a critical role in regulating motor coordination.

      References

      (1) Borgius, L., Nishimaru, H., Caldeira, V., Kunugise, Y., Low, P., Reig, R., Itohara, S., Iwasato, T., and Kiehn, O. (2014). Spinal glutamatergic neurons defined by EphA4 signaling are essential components of normal locomotor circuits. J Neurosci 34, 3841-3853.

      (2) Kullander, K., Butt, S.J., Lebret, J.M., Lundfald, L., Restrepo, C.E., Rydstrom, A., Klein, R., and Kiehn, O. (2003). Role of EphA4 and EphrinB3 in local neuronal circuits that control walking. Science 299, 1889-1892.

      The authors need to provide allele designations for the mutant alleles following accepted nomenclature guidelines.

      Thank you for your careful review! We have reviewed and made revisions to the genes and mutation symbols throughout the entire text.

      The three antisense morpholino oligonucleotides need to be validated for efficacy and specificity.

      Thanks for the comments. The morpholinos were extensively used and validated in previous studies, and the efficacy of these morpholinos has been thoroughly validated in multiple studies (Addison et al., 2018; Cavodeassi et al., 2013; Letelier et al., 2018; Royet et al., 2017). Furthermore, we also performed swimming behavior analysis in the mutant background, which showed similar results as the morphants. Moreover, we also performed rescue experiments to confirm the specificity of the mutants (Fig. S5). Finally, we reconfirmed the abnormal calcium signaling in the mutants (Fig. 4), which further support our previous knockdown results.

      References

      (1) Addison, M., Xu, Q., Cayuso, J., and Wilkinson, D.G. (2018). Cell Identity Switching Regulated by Retinoic Acid Signaling Maintains Homogeneous Segments in the Hindbrain. Dev Cell 45, 606-620 e603.

      (2) Cavodeassi, F., Ivanovitch, K., and Wilson, S.W. (2013). Eph/Ephrin signalling maintains eye field segregation from adjacent neural plate territories during forebrain morphogenesis. Development 140, 4193-4202.

      (3) Letelier, J., Terriente, J., Belzunce, I., Voltes, A., Undurraga, C.A., Polvillo, R., Devos, L., Tena, J.J., Maeso, I., Retaux, S., et al. (2018). Evolutionary emergence of the rac3b/rfng/sgca regulatory cluster refined mechanisms for hindbrain boundaries formation. Proc Natl Acad Sci U S A 115, E3731-E3740.

      (4) Royet, A., Broutier, L., Coissieux, M.M., Malleval, C., Gadot, N., Maillet, D., Gratadou-Hupon, L., Bernet, A., Nony, P., Treilleux, I., et al. (2017). Ephrin-B3 supports glioblastoma growth by inhibiting apoptosis induced by the dependence receptor EphA4. Oncotarget 8, 23750-23759.

      Line 229. "While in consistent with previous reports, the hindbrain rhombomeric boundaries were found to be defective....". This sentence is not clear. Please describe how it is "inconsistent".

      Thanks for the comments and sorry for the unclear description, we have described this more clearly in our revised manuscript (page 9, line 229-230).

      Animals frequently are described as "heterozygous mutants" or "mutants". Please make clear that the latter are homozygous mutant animals.

      Thanks for the comments. In the manuscript, all references to mutants specifically indicate homozygous mutants. Heterozygous mutants are explicitly identified as such.

      The chromatin interaction portion of the Methods does not include any information on how these experiments were conducted or where the data were obtained. This information needs to be provided.

      Thanks for your advice. The detailed information of chromatin interaction mapping has been provided in “Methods and Materials” (page 18-19, line 450-455). Information about the interacting regions was derived from Hi-C datasets of 21 tissues and cell types provided by GSE87112. The significance of interactions for Hi-C datasets was computed by Fit-Hi-C, with an FDR ≤ 10-6 considered significant.

      The authors present single-cell RNA-seq data in Supplementary Figure 5 for which they cite Cavone et al, 2021. This seems like an odd database to use. Can the authors provide an explanation for choosing it? In any case, the citation should also be made in the Supplementary Figure 5 legend.

      Thank you for your rigorous comment, we have cited this literature in the proper place of the revised manuscript. Cavone et al. used the her4.3:GFP line to label ependymo-radial glia (ERG) progenitor cells and performed single-cell RNA-seq on FACS-isolated fluorescent cells. The isolated cells included not only ERG progenitors but also undifferentiated and differentiated neurons and oligodendrocytes. The authors attributed this to the relative stability of the GFP protein, which remained in the progeny of GFP-expressing her4.3+ ERG progenitor cells, thus effectively acting as a short-term cell lineage tracer. Indeed, clustering analysis of this data successfully identifies neural progenitors and other neural clusters. Therefore, we consider that this scRNA-seq data encompasses a comprehensive range of neural cell types and is suitable for analyzing the expression of genes of interest. Furthermore, we downloaded and analyzed the scRNA-seq data of the zebrafish nervous system reported by Scott et al. in 2021 (Fig. S7B) (Scott et al., 2021). Despite differences in the developmental stages of the larvae analyzed (Cavone et al. examined larvae at 4 dpf, whereas Scott et al. analyzed larvae at 24, 36, and 48 hpf), our findings are consistent. Specifically, epha4a and epha4b are expressed in interneurons, whereas efnb3a and efnb3b are enriched in floor plate cells.

      References

      (1) Scott, K., O'Rourke, R., Winkler, C.C., Kearns, C.A., and Appel, B. (2021). Temporal single-cell transcriptomes of zebrafish spinal cord pMN progenitors reveal distinct neuronal and glial progenitor populations. Dev Biol 479, 37-50.

      In Figure Legend 1, "expressed from the EPHA4-mutant plasmid" is not an accurate description of the experiment.

      Sorry for the previous inaccurate description. The description has been revised to accurately reflect the experiment. “Western blot analysis of EPHA4-c.2546G>A variant showing the protein expression levels of EPHA4 and CDK5 and the amount of phosphorylated CDK5 (pCDK5) in HEK293T cells transfected with EPHA4-mutant or EPHA4-WT plasmid”.

      Figure 3 panels J and K need more explanation. I don't understand what the different colors represent nor do I understand what are wild type and what are mutant data.

      Thank you for your valuable feedback. We apologize for the lack of clarity in the original figure legend. To address this, we have revised the legend of Figure 3 to provide a more detailed explanation. In panels J and K, each color-coded curve represents the response of an individual larva from an independent experimental trial to the stimulus. Specifically, panel J depicts the response data for the wild-type larvae, whereas panel K presents the response data for the homozygous epha4a mutants.

      Please provide the genotypes for the images in Figure 5A.

      Thanks for the comments and we are sorry for our unclear description, we have described this more clearly in the Figure 5.

      Figure legend 6B should also note the heterozygote data with the wild type and homozygous mutant data.

      Thanks for the comments, the data are now included in Figure 6B.

      Epha4 and Efnb3 have well-established roles in axon guidance. Although this is noted in the Discussion, I think a more extensive description of prior findings would be helpful.

      Thanks for your valuable feedback. A more detailed description of the roles of Epha4 and Efnb3 in axon guidance was provided in the “Discussion” (page 16, line 388-396).

      The main conclusion of this manuscript is that EPHA4 variants cause IS by disrupting central pattern generator function. I think this is misleading. I think that the more valid conclusion is that EPHA4 loss of function causes axon pathfinding defects that impair locomotion by disrupting CPG activity, thereby leading to IS. I urge the authors to consider this more nuanced interpretation.

      Thank you for your insightful comments. We appreciate your suggestion to refine our main conclusion. We agree that the proposed revision more accurately reflects our findings and will revise the manuscript accordingly to state that “EPHA4 loss of function causes axon pathfinding defects, which impair locomotion by disrupting central pattern generator activity, potentially leading to IS.”

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, Seidenthal et al. investigated the role of the C. elegans Flower protein, FLWR-1, in synaptic transmission, vesicle recycling, and neuronal excitability. They confirmed that FLWR-1 localizes to synaptic vesicles and the plasma membrane and facilitates synaptic vesicle recycling at neuromuscular junctions, albeit in an unexpected manner. The authors observed that hyperstimulation results in endosome accumulation in flwr-1 mutant synapses, suggesting that FLWR-1 facilitates the breakdown of endocytic endosomes, which differs from earlier studies in flies that suggested the Flower protein promotes the formation of bulk endosomes. This is a valuable finding. Using tissue-specific rescue experiments, the authors showed that expressing FLWR-1 in GABAergic neurons restored the aldicarb-resistant phenotype seen in flwr-1 mutants to wild-type levels. In contrast, FLWR-1 expression in cholinergic neurons in flwr-1 mutants did not restore aldicarb sensitivity, yet muscle expression of FLWR-1 partially but significantly recovered the aldicarb-resistant defects. The study also revealed that removing FLWR-1 leads to increased Ca<sup>2+</sup> signaling in motor neurons upon photo-stimulation. Further, the authors conclude that FLWR-1 contributes to the maintenance of the excitation/inhibition (E/I) balance by preferentially regulating the excitability of GABAergic neurons. Finally, SNG-1::pHluorin data imply that FLWR-1 removal enhances synaptic transmission, however, the electrophysiological recordings do not corroborate this finding.

      Strengths:

      This study by Seidenthal et al. offers valuable insights into the role of the Flower protein, FLWR-1, in C. elegans. Their findings suggest that FLWR-1 facilitates the breakdown of endocytic endosomes, which marks a departure from its previously suggested role in forming endosomes through bulk endocytosis. This observation could be important for understanding how Flower proteins function across species. In addition, the study proposes that FLWR-1 plays a role in maintaining the excitation/inhibition balance, which has potential impacts on neuronal activity.

      Weaknesses:

      One issue is the lack of follow-up tests regarding the relative contributions of muscle and GABAergic FLWR-1 to aldicarb sensitivity. The findings that muscle expression of FLWR-1 can significantly rescue aldicarb sensitivity are intriguing and may influence both experimental design and data interpretation. Have the authors examined aldicarb sensitivity when FLWR-1 is expressed in both muscles and GABAergic neurons, or possibly in muscles and cholinergic neurons? Given that muscles could influence neuronal activity through retrograde signaling, a thorough examination of FLWR-1's role in muscle is necessary, in my opinion.

      We thank the reviewer for this suggestion. Indeed, the retrograde inhibition of cholinergic transmission by signals from muscle has been demonstrated by the Kaplan lab in a number of publications. We have now done the experiments that were suggested, see the new Fig. S3B: rescuing FLWR-1 in cholinergic neurons and in muscle did not perform any better in the aldicarb assay, while co-rescue in GABAergic neurons and muscle, like rescue in GABA neurons, led to a complete rescue to wild type levels. Thus, retrograde signaling from muscle to neurons does not contribute to effects on the E/I imbalance caused by the absence of FLWR1. The fact that muscle rescue can partially rescue the flwr-1 phenotype is likely due a cellautonomous effect of FLWR-1 on muscle excitability, facilitating muscle contraction.

      Would the results from electrophysiological recordings and GCaMP measurements be altered with muscle expression of FLWR-1? Most experiments presented in the manuscript compare wild-type and flwr-1 mutant animals. However, without tissue-specific knockout, knockdown, or rescue experiments, it is difficult to separate cell-autonomous roles from non-cell-autonomous effects, in particular in the context of aldicarb assay results. Also, relying solely on levamisole paralysis experiments is not sufficient to rule out changes in muscle AChRs, particularly due to the presence of levamisole-resistant receptors.

      We repeated the Ca<sup>2+</sup> imaging in cholinergic neurons, in response to optogenetic activation, with expression of FLWR-1 in muscle, see Fig. 4E. This did not significantly alter the increased excitability of the flwr-1 mutant. Thus, we conclude that, along with the findings in aldicarb assays, the function of FLWR-1 in muscle is cell-autonomous, and does not indirectly affect its roles in the motor neurons. Also, cholinergic expression of FLWR-1 by itself reduced Ca<sup>2+</sup> levels to those in wild type (Fig. 4E). In addition, we now also assessed the contribution of the N-AChR (ACR-16) to aldicarb-induced paralysis (Fig. S3C), showing that flwr-1 and acr-16 mutations independently mediate aldicarb resistance, and that these effects are additive. Thus, FLWR-1 does not affect the expression level or function of the N-AChR, as otherwise, the flwr1; acr-16 double mutation would not exacerbate the phenotype of the single mutants.

      This issue regarding the muscle role of FLWR-1 also complicates the interpretation of results from coelomocyte uptake experiments, where GFP secreted from muscles and coelomocyte fluorescence were used to estimate endocytosis levels. A decrease in coelomocyte GFP could result from either reduced endocytosis in coelomocytes or decreased secretion from muscles. Therefore, coelomocytespecific rescue experiments seem necessary to distinguish between these possibilities.

      We have performed a rescue of FLWR-1 in coelomocytes to address this, and found that this fully recovered the CC GFP signals to wild type levels. Therefore, the absence of FLWR-1 in muscles does not affect exocytosis of GFP. The data can be found in Fig. 5A, B.

      The manuscript states that GCaMP was used to estimate Ca<sup>2+</sup> levels at presynaptic sites. However, due to the rapid diffusion of both Ca<sup>2+</sup> and GCaMP, it is unclear how this assay distinguishes Ca<sup>2+</sup> levels specifically at presynaptic sites versus those in axons. What are the relative contributions of VGCCs and ER calcium stores here? This raises a question about whether the authors are measuring the local impact of FLWR-1 specifically at presynaptic sites or more general changes in cytoplasmic calcium levels.

      We compared Ca<sup>2+</sup> signals in synaptic puncta versus axon shafts, and did not find any differences. The data previously shown have been replaced by data where the ROIs were restricted to synaptic puncta. The outcome is the same as before. These data are provided in Fig. 4A, B, E, F. We thus conclude that the impact of FLWR-1 is local, in synaptic boutons.

      The experiments showing FLWR-1's presynaptic localization need clarification/improvement. For example, data shown in Fig. 3B represent GFP::FLWR-1 is expressed under its own promoter, and TagRFP::ELKS-1 is expressed exclusively in GABAergic neurons. Given that the pflwr-1 drives expression in both cholinergic and GABAergic neurons, and there are more cholinergic synapses outnumbering GABAergic ones in the nerve cord, it would be expected that many green FLWR-1 puncta do not associate with TagRFP::ELKS-1. However, several images in Figure 3B suggest an almost perfect correlation between FLWR-1 and ELKS-1 puncta. It would be helpful for the readers to understand the exact location in the nerve cord where these images were collected to avoid confusion.

      Thank you for making us aware that the provided images may be misleading. We have now extended this Figure (Fig. 3A-C) and provided more intensity profiles along the nerve cords in Fig. S4A-C. The quantitative analysis of average R<sup>2</sup> for the two fluorescent signals in each neuron type did not show any significant difference between the two, also after choosing slightly smaller ROIs for line scan analysis. We also highlighted the puncta corresponding to FLWR-1 in both neurons types, as well as to ELKS-1 in each specific neuron type, to identify FLWR-1 puncta without co-localized ELKS-1 signal. Also, we indicated the region that was imaged, i.e. the DNC posterior of the vulva, halfway to the posterior end of the nerve cord.

      The SNG-1::pHluorin data in Figure 5C is significant, as they suggest increased synaptic transmission at flwr-1 mutant synapses. However, to draw conclusions, it is necessary to verify whether the total amount of SNG-1::pHluorin present on synaptic vesicles remains the same between flwr-1 mutant and wild-type synapses. Without this comparison, a conclusion on levels of synaptic vesicle release based on changes in fluorescence might be premature, in particular given the results of electrophysiological recordings.

      We appreciate the comment. We now added data and experiments that verify that the basal SNG-1::pHluorin signal in the plasma membrane, measured at synaptic puncta and in adjacent axonal areas, is not different in flwr-1 mutants compared to wild type in the absence of stimulation. This data can be found in Fig. S5A. In addition, we cultured primary neurons from transgenic animals to compare total SNG-1::pHluorin to the vesicular fraction, by adding buffers of defined pH to the external, or buffers that penetrate the cell and fix intracellular pH. These experiments (Fig. S5B, C) showed no difference in the vesicle fraction of the pHluorin signal in wild type vs. flwr-1 mutant cells, demonstrating that flwr-1 mutants do not per se have altered SNG-1::pHluorin in their SV or plasma membranes.

      Finally, the interpretation of the E74Q mutation results needs reconsideration. Figure 8B indicates that the E74Q variant of FLWR-1 partially loses its rescuing ability, which suggests that the E74Q mutation adversely affects the function of FLWR-1. Why did the authors expect that the role of FLWR-1 should have been completely abolished by E74Q? Given that FLWR-1 appears to work in multiple tissues, might FLWR-1's function in neurons requires its calcium channel activity, whereas its role in muscles might be independent of this feature? While I understand there is ongoing debate about whether FLWR1 is a calcium channel, the experiments in this study do not definitively resolve local Ca<sup>2+</sup> dynamics at synapses. Thus, in my opinion, it may be premature to draw firm conclusions about calcium influx through FLWR-1.

      Thank you for bringing this up. We did not expect E74Q to necessarily abolish FLWR-1 function, unless it would be a Ca<sup>2+</sup> channel. Of course the reviewer is right, FLWR-1 might have functions as an ion channel as well as channel-independent functions. Yet, we are quite confident that FLWR-1 is not an ion channel. Instead, we think that E74Q alters stability of the protein (however, in the absence of biochemical data, we removed this conclusion), and that this impairs the function of FLWR-1 as a modulator, or possibly even, accessory subunit of the PMCA MCA-3. This interaction was indicated by a new experiment we added, where we found that FLWR-1 and MCA-3 must be physically very close to each other in the plasma membrane, using bimolecular fluorescence complementation (see new Fig. 9A, B). This provides a reasonable explanation for findings we obtained, i.e. increased Ca<sup>2+</sup> levels in stimulated neurons of the flwr-1 mutant. If FLWR-1 acts as a stimulatory subunit of MCA-3, then its absence may cause reduced MCA-3 function and thus an accumulation of Ca<sup>2+</sup> in the synaptic terminals. In Drosophila, hyperstimulation of neurons led to reduced Ca<sup>2+</sup> levels (Yao et al., 2017, PLoS Biol 15: e2000931), suggesting that Flower is a Ca<sup>2+</sup> channel. Based on our findings, we suggest an alternative explanation. Based on proteomics, the PMCA is a component of SVs (Takamori et al., 2006, Cell 127: 831-846). Increased insertion of PMCA into the plasma membrane during high stimulation, along with impaired endocytosis in flower mutants, would increase the steadystate levels of PMCA in the PM. This could lead to reduced steady state levels of Ca<sup>2+</sup>. This ‘g.o.f.’ in Flower may also impact on Ca<sup>2+</sup> microdomains of the P/Q type VGCC required for SV fusion, which could contribute to the rundown of EPSCs we find during synaptic hyperstimulation (Fig. 5G-J). We acknowledge, though, that Yao et al. (2009, Cell 138: 947– 960), showed increased uptake of Ca<sup>2+</sup> into liposomes reconstituted with purified Flower protein. However, it cannot be ruled out that a protein contaminant could be responsible, as the controls were empty liposomes, not liposomes reconstituted with a mutated Flower protein purified the same way.

      We also tested the E74Q mutant in its ability to rescue the reduced PI(4,5)P<sub>2</sub> levels in coelomocytes (CCs), where we observed no positive effect. While we have not measured Ca<sup>2+</sup> in CCs, we would assume that here a function of FLWR-1 affecting increased PI(4,5)P<sub>2</sub> levels is not linked to a channel function. It was, nevertheless, compromised by E74Q (Fig. 8D).

      Also, the aldicarb data presented in Figures 8B and 8D show notable inconsistencies that require clarification. While Figure 8B indicates that the 50% paralysis time for flwr-1 mutant worms occurs at 3.5-4 hours, Figure 8D shows that 50% paralysis takes approximately 2.5 hours for the same flwr-1 mutants. This discrepancy should be addressed. In addition, the manuscript mentions that the E74Q mutation impairs FLWR-1 folding, which could significantly affect its function. Can the authors show empirical data supporting this claim?

      We performed the aldicarb assays in a consistent manner, but nonetheless note that some variability from day to day can affect such outcomes. Importantly, we always measured each control (wild type, flwr-1) along with each test strain (FLWR-1 point mutants), to ensure the relevant estimate of a point-mutant’s effect. These assays have been repeated, now including the FLWR-1 wild type rescue strain as a comparison. The data are now combined in Fig. 8B. Regarding the assumed instability of the E74Q mutant, as we, indeed, do not have any experimental data supporting this, we removed this sentence.

      Reviewer #2 (Public review):

      Summary:

      The Flower protein is expressed in various cell types, including neurons. Previous studies in flies have proposed that Flower plays a role in neuronal endocytosis by functioning as a Ca<sup>2+</sup> channel. However, its precise physiological roles and molecular mechanisms in neurons remain largely unclear. This study employs C. elegans as a model to explore the function and mechanism of FLWR-1, the C. elegans homolog of Flower. This study offers intriguing observations that could potentially challenge or expand our current understanding of the Flower protein. Nevertheless, further clarification or additional experiments are required to substantiate the study's conclusions.

      Strengths:

      A range of approaches was employed, including the use of a flwr-1 knockout strain, assessment of cholinergic synaptic activity via analyzing aldicarb (a cholinesterase inhibitor) sensitivity, imaging Ca<sup>2+</sup> dynamics with GCaMP3, analyzing pHluorin fluorescence, examination of presynaptic ultrastructure by EM, and recording postsynaptic currents at the neuromuscular junction. The findings include notable observations on the effects of flwr-1 knockout, such as increased Ca<sup>2+</sup> levels in motor neurons, changes in endosome numbers in motor neurons, altered aldicarb sensitivity, and potential involvement of a Ca<sup>2+</sup>-ATPase and PIP2 binding in FLWR-1's function.

      Weaknesses:

      (1) The observation that flwr-1 knockout increases Ca<sup>2+</sup> levels in motor neurons is notable, especially as it contrasts with prior findings in flies. The authors propose that elevated Ca<sup>2+</sup> levels in flwr-1 knockout motor neurons may stem from "deregulation of MCA-3" (a Ca<sup>2+</sup> ATPase in the plasma membrane) due to FLWR-1 loss. However, this conclusion relies on limited and somewhat inconclusive data (Figure 7). Additional experiments could clarify FLWR-1's role in MCA-3 regulation. For instance, it would be informative to investigate whether mutations in other genes that cause elevated cytosolic Ca<sup>2+</sup> produce similar effects, whether MCA-3 physically interacts with FLWR-1, and whether MCA-3 expression is reduced in the flwr-1 knockout.

      We thank the reviewer for bringing up these critical points. As to other mutations that produce elevated cytosolic Ca<sup>2+</sup>: Possible mutations could be g.o.f. mutations of the ryanodine receptor UNC-68, the sarco-endoplasmatic Ca<sup>2+</sup> ATPase, or mutants affecting VGCCs, like the L-type channel EGL-19 or the P/Q-type channel UNC-2. However, any such mutant would affect muscle contractions (as we have shown for r.o.f. mutations in unc-68, egl-19 and unc-2 in Nagel et al. 2005 Curr Biol 15: 2279-84) and thus would affect aldicarb assays (see aldicarb resistance induced by RNAi of these genes in Sieburth et al., 2005, Nature 436: 510). The same should be expected for g.o.f. mutations of any such gene. In neurons, we would expect increased or decreased Ca<sup>2+</sup> levels in response to stimulation.

      Regarding the physical interaction of MCA-3 and FLWR-1, we performed bimolecular fluorescence complementation, with two fragments of mVenus fused to the two proteins. This assay shows mVenus reconstitution (i.e., fluorescence) if the two proteins are found in close vicinity to each other. Testing MCA-3 and FLWR-1 in muscle indeed showed a robust signal, evenly distributed on the plasma membrane. As a control, FLWR-1 did not interact with another plasma membrane protein, the stomatin UNC-1 interacting with gap junction proteins (Chen et al., 2007, Curr Biol 17: 1334-9). FLWR-1 also interacted with the ER chaperone Nicalin (NRA2 in C. elegans), which helps assembling the TM domains of integral membrane proteins in association with the SEC translocon. However, this signal only occurred in the ER membrane, demonstrating the specificity of the BiFC assay. This data is presented in Fig. 9A, B. Additionally, we show that FLWR-1 expression has a function in stabilizing MCA-3 localization at synapses, which is also in line with the idea of a direct interaction (Fig. 9C, D).

      (2) In silico analysis identified residues R27 and K31 as potential PIP2 binding sites in FLWR-1. The authors observed that FLWR-1(R27A/K31A) was less effective than wild-type FLWR-1 in rescuing the aldicarb sensitivity phenotype of the flwr-1 knockout, suggesting that FLWR-1 function may depend on PIP2 binding at these two residues. Given that mutations in various residues can impair protein function non-specifically, additional studies may be needed to confirm the significance of these residues for PIP2 binding and FLWR-1 function. In addition, the authors might consider explicitly discussing how this finding aligns or contrasts with the results of a previous study in flies, where alanine substitutions at K29 and R33 impaired a Flower-related function (Li et al., eLife 2020).

      We further investigated the role of these two residues in an in vivo assay for PIP2 binding and membrane association of a reporter. We used the coelomocytes (CCs), in which a previous publication demonstrated that a GFP variant tagged with a PH domain would be recruited to the CC membrane (Bednarek et al., 2007, Traffic 8: 543-53). This assay was performed in wild type, flwr-1 mutants, and flwr-1 mutants rescued with wild type FLWR-1, the FLWR-1(E74Q) mutant, or the FLWR-1(K27A; R31A) double mutant. The data are shown in Fig. 8C, D. While the wild type FLWR-1 rescued PH-GFP levels at the CC membrane to the wild type control, the FLWR-1(K27A; R31A) double mutant did not rescue the reporter binding, indicating that, at least in CCs, reduced PIP2 levels are associated with non-functional FLWR-1. Mechanistically, this is not clear at present, though we noted a possible mechanism as found for synaptotagmin, that recruits the PIP2 kinase to the plasma membrane via a lysine and arginine containing motif (Bolz et al., 2023, Neuron 111: 3765-3774.e3767). We mention this now in the discussion. We also discussed our data with respect to the findings of Li et al., about the analogous residues K27, R31 (K29, R33) in the discussion section, i.e. lines 667-670, and the differences of our findings in electron microscopy compared to the Drosophila work (more rather than less bulk endosomes) were discussed in lines 713-720.

      (3) A primary conclusion from the EM data was that FLWR-1 participates in the breakdown, rather than the formation, of bulk endosomes (lines 20-22). However, the reasoning behind this conclusion is somewhat unclear. Adding more explicit explanations in the Results section would help clarify and strengthen this interpretation.

      We added a sentence trying to better explain our reasoning. Mainly, the argument is that accumulation of such endosomes of unusually large size is seen in mutants affecting formation of SVs from the endosome (in endophilin and synaptojanin mutants), while mutants affecting mainly endocytosis (dynamin) cause formation of many smaller endocytic structures that stay attached to the plasma membrane (Kittelmann et al., 2013, PNAS 110: E3007-3016). We changed our data analysis in that we collated the data for what we previously termed endosomes and large vesicles. According to the paper by Watanabe, 2013, eLife 2: e00723, endosomes are defined by their location in the synapse, and their size. However, this work used a much shorter stimulus and froze the preparations within a few dozens to hundreds of msec after the stimulus, while we used the protocol of Kittelmann 2013, which uses 30 sec stimulation and freezing after 5 sec. There, endosomes were defined as structures larger than SVs or DCVs, but no larger than 80 nm, with an electron dense lumen, and were very rarely observed. In contrast, large vesicles or ‘100 nm vesicles’, ranged from 50-200 nm diameter, with a clear lumen, were morphologically similar to the bulk endosomes as observed by Li et al., 2021. We thus reordered our data and jointly analyzed these structure as large vesicles / bulk endosomes. The outcome is still the same, i.e. photostimulated flwr-1 mutants showed more LVs than wild type synapses.

      (4) The aldicarb assay results in Figure 3 are intriguing, indicating that reduced GABAergic neuron activity alone accounts for the flwr-1 mutant's hyposensitivity to aldicarb. Given that cholinergic motor neurons also showed increased activity in the flwr-1 mutant, one might expect the flwr-1 mutant to display hypersensitivity to aldicarb in the unc-47 knockout background. However, this was not observed. The authors might consider validating their conclusion with an alternative approach or, at the minimum, providing a plausible explanation for the unexpected result. Since aldicarb-induced paralysis can be influenced by factors beyond acetylcholine release from cholinergic motor neurons, interpreting aldicarb assay results with caution may be advisable. This is especially relevant here, as FLWR-1 function in muscle cells also impacts aldicarb sensitivity (Figure S3B). Previous electrophysiological studies have suggested that aldicarb sensitivity assays may sometimes yield misleading conclusions regarding protein roles in acetylcholine release.

      We tested the unc-47; flwr-1 animals again at a lower concentration of aldicarb, to see if the high concentration may have leveled the differences between unc-47 animals and the double mutant. This experiment is shown in Fig. S3D, demonstrating that the double mutant is significantly less resistant to aldicarb. This verifies that FLWR-1 acts not only in GABAergic neurons, but also in cholinergic neurons (as we saw by electron microscopy and electrophysiology), and that the increased excitability of cholinergic cells leads to more acetylcholine being released. In the double mutant, where GABA release is defective, this conveys hypersensitivity to aldicarb.

      (5) Previous studies have suggested that the Flower protein functions as a Ca<sup>2+</sup> channel, with a conserved glutamate residue at the putative selectivity filter being essential for this role. However, mutating this conserved residue (E74Q) in C. elegans FLWR-1 altered aldicarb sensitivity in a direction opposite to what would be expected for a Ca<sup>2+</sup> channel function. Moreover, the authors observed that E74 of FLWR1 is not located near a potential conduction pathway in the FLWR-1 tetramer, as predicted by Alphafold3. These findings raise the possibility that Flower may not function as a Ca<sup>2+</sup> channel. While this is a potentially significant discovery, further experiments are needed to confirm and expand upon these results.

      As above, we do not exclude that FLWR-1 may constitute a channel, however, based on our findings, AF3 structure predictions and data in the literature, we are considering alternative explanations for the observed effect on Ca<sup>2+</sup> levels of Flower mutants in worms and flies. The observations of increase Ca<sup>2+</sup> levels in stimulated flwr-1 mutant neurons could result from a reduced stimulation of the PMCA, and this was also observed with low stimulation in Drosophila (Yao et al., 2017). This idea is supported by the indications of a direct physical interaction, or proximity, of the two proteins. The reduced Ca<sup>2+</sup> levels after hyperstimulation of Drosophila Flower mutants may have to do with increased levels of non-recycling PMCA in the plasma membrane, indicating that PMCA requires Flower for recycling. This could be underlying the rundown of evoked PSCs we find in worm flwr-1 mutants, and would also be in line with a function of FLWR-1 and MCA-3 in coelomocytes, cells that constantly endocytose, and in which both proteins are required for proper function (our data, Figs. 5A, B; 8D, E) and Bednarek et al., 2007 (Traffic 8: 543-553). CCs need to recycle / endocytose membranes and membrane proteins, and such proteins, likely including FLWR-1 and MCA-3, need to be returned to the PM effectively.

      We thus refrained from testing a putative FLWR-1 channel function in Xenopus oocytes, in part also because we would not be able to acutely trigger possible FLWR-1 gating. A constitutive Ca<sup>2+</sup> current, if it were present, would induce large Cl<sup>-</sup> conductance in oocytes, that would likely be problematic / killing the cells. The demonstration that FLWR-1(E74Q) does not rescue the PI(4,5)P<sub>2</sub> levels in coelomocytes is also more in line with a non-channel function of FLWR-1.

      (6) Phrases like "increased excitability" and "increased Ca<sup>2+</sup> influx" are used throughout the manuscript. However, there is no direct evidence that motor neurons exhibit increased excitability or Ca<sup>2+</sup> influx. The authors appear to interpret the elevated Ca<sup>2+</sup> signal in motor neurons as indicative of both increased excitability and Ca<sup>2+</sup> influx. However, this elevated Ca<sup>2+</sup> signal in the flwr-1 mutant could occur independently of changes in excitability or Ca<sup>2+</sup> influx, such as in cases of reduced MCA-3 activity. The authors may wish to consider alternative terminology that more accurately reflects their findings.

      Thank you, we rephrased the imprecise wording. Ca<sup>2+</sup> influx was meant with respect to the cytosol.

      Reviewer #3 (Public review):

      Summary:

      Seidenthal et al. investigated the role of the Flower protein, FLWR-1, in C. elegans and confirmed its involvement in endocytosis within both synaptic and non-neuronal cells, possibly by contributing to the fission of bulk endosomes. They also uncovered that FLWR-1 has a novel inhibitory effect on neuronal excitability at GABAergic and cholinergic synapses in neuromuscular junctions.

      Strengths:

      This study not only reinforces the conserved role of the Flower protein in endocytosis across species but also provides valuable ultrastructural data to support its function in the bulk endosome fission process. Additionally, the discovery of FLWR-1's role in modulating neuronal excitability broadens our understanding of its functions and opens new avenues for research into synaptic regulation.

      Weaknesses:

      The study does not address the ongoing debate about the Flower protein's proposed Ca<sup>2+</sup> channel activity, leaving an important aspect of its function unexplored. Furthermore, the evidence supporting the mechanism by which FLWR-1 inhibits neuronal excitability is limited. The suggested involvement of MCA-3 as a mediator of this inhibition lacks conclusive evidence, and a more detailed exploration of this pathway would strengthen the findings.

      We added new data showing the likely direct interaction of FLWR-1 with the PMCA, possibly upregulating / stimulating its function. This data is shown now in Fig. 9A, B. Also, we show now that FLWR-1 is required to stabilize MCA-3 expression / localization in the pre-synaptic plasma membrane (Fig. 9C, D). These findings are not supporting the putative function of FLWR-1 as an ion channel, but suggest that increased Ca<sup>2+</sup> levels following neuron stimulation in flwr-1 mutants are due to an impairment of MCA-3 and thus reduced Ca<sup>2+</sup> extrusion.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The authors might consider focusing on one or two key findings from this study and providing robust evidence to substantiate their conclusions.

      We did substantiate the interactions of FLWR-1 and the PMCA, as well as assessing the function of FLWR-1 in the coelomocytes and the function of FLWR-1 in regulating PIP2 levels in the plasma membrane.

      Reviewer #3 (Recommendations for the authors):

      (1) Behavioral Analysis of Locomotion

      In Figure 1, the authors are encouraged to examine whether flwr-1 mutants show altered locomotion behaviors, such as velocity, in a solid medium.

      We performed such an analysis for wild type, comparing to flwr-1 mutants and flwr-1 mutants rescued with FLWR-1 expressed from the endogenous promoter. The data are shown in Fig. S1C. There was no difference. We note that we observed differences in swimming assays also only when we strongly stimulated the cholinergic neurons by optogenetic depolarization, but not during unstimulated, normal swimming.

      (2) Validation of FLWR-1 Tagging

      In Figure 2A, it is recommended that the authors confirm the functionality of the C-terminal-tagged FLWR-1.

      We performed such rescue assays during swimming. The data is shown in Fig. S2S, E. While the GFP::FLWR-1 animals were slightly affected right after the photostimulation, they quickly caught up with the wild type controls, while flwr-1 mutants remained affected even after several minutes.

      (3) Explanation of Differential Rescue in GABAergic Neurons and Muscle

      The authors should provide a rationale for why restoring FLWR-1 in GABAergic neurons fully rescues the aldicarb resistance phenotype, while its restoration in muscle also partially rescues it.

      We think that these effects are independent of each other, i.e. loss of FLWR-1 in muscles increases muscular excitability, which becomes apparent in the behavioral assay that depends on locomotion and muscle contraction. To assess this further, we performed combined GABAergic neuron and muscle rescue assays, as shown in Fig. S3B. The double rescue was not different from wild type, and performed better than the muscle rescue alone.

      (4) Rescue Experiments for Swimming Defect in GABAergic Neurons

      Consider adding rescue experiments to determine whether expressing FLWR-1 specifically in GABAergic neurons can restore the swimming defect phenotype.

      We did not perform this assay as swimming is driven by cholinergic neurons, meaning that we would only indirectly probe GABAergic neuron function and a GABAergic FLWR-1 rescue would likely not improve swimming much. Also, given the importance of the correct E/I balance in the motor neurons, it would likely require achieving expression levels that are very precisely matching endogenous expression levels, which is not possible in a cell-specific manner.

      (5) Further Data on GCaMP Assay for mca-3; flwr-1 Additive Effect

      The additive effect of the mca-3 and flwr-1 mutations on GCaMP signals requires further data for substantiation. Additional GCaMP recordings or statistical analysis would provide stronger support for the proposed interaction between MCA-3 and FLWR-1 in calcium signaling.

      Thank you. We increased the number of observations, and could thus improve the outcome of the assay in that it became more conclusive. Meaning, the double mutation was not exacerbating the effect of either single mutant, demonstrating that FLWR-1 and MCA-3 are acting in the same pathway. The data are in Fig. 7B, C.

      (6) Inclusion of Wild-Type FLWR-1 Rescue in Figures 8B and 8D

      Figures 8B and 8D would benefit from the inclusion of wild-type FLWR-1 as a rescue control.

      We included the FLWR-1 wild type rescue as suggested and summarized the data in Fig. 8B.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Responses to final minor critiques following initial revision

      Reviewer #1 (Recommendations for the authors): 

      The authors have generally done an excellent job of addressing my and the other reviewers' concerns. I have a few additional concerns that the authors could consider addressing through changes to the text: 

      We thank the Reviewer for this assessment and are glad to have addressed the major points.

      - Regarding the gRNA used for NMR studies, I thank the authors for adding additional rationale for their design of the RNA used. However, I still believe that it is misleading to term this RNA as a "gRNA", given that it is mainly composed of a sequence that is arbitrary (the spacer) and the sections of the gRNA that are constant between all gRNAs are truncated in a way that removes secondary structure that is likely essential for specific contacts with the Rec domains. I do not believe the authors need to make alterations to any of their experiments. However, I do think their description of the "gRNA" should be updated to properly reflect that this RNA lacks any of the secondary structure present in a typical gRNA, much of which is necessary to confer specificity of binding between GeoCas9 and the gRNA. As mentioned in my previous review, this may be best achieved by adding a cartoon of the secondary structure of the full-length gRNA and highlighting the region that was used in the truncated "gRNA". 

      We understand the Reviewer’s point. For any experiment in which the gRNA was truncated (i.e. NMR or some MST studies), we have clarified the text and no longer call it a “gRNA.” We state initially that it is a portion of the gRNA and then call it simply an “RNA.” 

      For experiments using the full-length constructs, we have kept the term “gRNA,” as it remains appropriate.

      We have also added a final Supplementary figure (S12) showing the structures of the truncated and full-length RNAs used, based on the _Geo_Cas9 cryo-EM structure and predicted with RNAfold.

      - Lines 256-257: "The ~3-fold decrease in Kd...". I believe the authors are discussing the Kd's of the mutants relative to WT, in which case the Kd increased. Also, the fold-change appears closer to 2fold than to 3-fold. 

      Yes, the Reviewer makes a good catch. We have corrected this.

      - Lines 407-408: "The mutations also diminished the stability of the full-length GeoCas9 RNP complex." This statement seems at odds with the authors' conclusions in the Results section that the full-length GeoCas9 variants had comparable affinities for the gRNAs (lines 376-382) 

      We agree that this seems contradictory. In the absence of full-length structures for all variants, we can’t definitively state what causes this. It could be that the mutation has an interesting allosteric effect on structure that does not affect RNA binding but induces the Cas9 protein to simply fall apart at lower temperatures, rendering the binding interaction moot. We have added a statement to this section.

      - The authors chose to keep "SpCas9" for consistency with their prior work and the work of many several others, including Doudna et al and Zhang et al. However, I will note that their publications on GeoCas9, the Doudna lab did use SpyCas9 to ensure consistent nomenclature within the publications. 

      We have made the change to “_Spy_Cas9”

      Reviewer #3 (Recommendations for the authors): 

      The authors clearly answered most of my concerns. I still have some technical questions about the analysis of CPMG-RD data but the numbers provided now seem to make sense. While I still think that crystal structures of the point mutant would make the conclusions more "bullet proof", I do appreciate the work associated with this and consider that the manuscript can be published as is. 

      We agree that additional magnetic fields could allow for additional models of CPMG data fitting and that additional crystal structures of the mutants could add to the conclusions. We appreciate the Reviewer recognizing the balance of the current results and potential future studies in signing off on publication.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, Nakagawa and colleagues report the observation that YAP is differentially localized, and thus differentially transcriptionally active, in spheroid cultures versus monolayer cultures. YAP is known to play a critical role in the survival of drug-tolerant cancer cells, and as such, the higher levels of basally activated YAP in monolayer cultures lead to higher fractions of surviving drug-tolerant cells relative to spheroid culture (or in vivo culture). The findings of this study, revealed through convincing experiments, are elegantly simple and straightforward, yet they add significantly to the literature in this field by revealing that monolayer cultures may actually be a preferential system for studying residual cell biology simply because the abundance of residual cells in this format is much greater than in spheroid or xenograft models. The potential linkage between matrix density and stiffness and YAP activation, while only speculated upon in this manuscript, is intriguing and a rich starting point for future studies.

      Although this work, like any important study, inspires many interesting follow-on questions, I am limiting my questions to only a few minor ones, which may potentially be explored either in the context of the current study or in separate, follow-on studies.

      We appreciate Reviewer #1's comments that our work is of importance to the field and particularly that it will "...add significantly to the literature in this field by revealing that monolayer cultures may actually be a preferential system for studying residual cell biology..."  We have sought to highlight the importance of how our findings could be applied to study resistance mechanisms at various points in the manuscript.

      Strengths:

      The major strengths of the work are described above.

      Weaknesses:

      Rather than considering the following points as weaknesses, I instead prefer to think of them as areas for future study:

      (1) Given the field's intense interest in the biology and therapeutic vulnerabilities of residual disease cells, I suspect that one major practical implication of this work could be that it inspires scientists interested in working in the residual disease space to model it in monolayer culture. However, this relies upon the assumption that drug-tolerant cells isolated in monolayer culture are at least reasonably similar in nature to drug-tolerant cells isolated from spheroid or xenograft systems. Is this true? An intriguing experiment that could help answer this question would be to perform gene expression profiling on a cell line model in the following conditions: monolayer growth, drug tolerant cells isolated from monolayer growth conditions, spheroid growth, drug tolerant cells isolated from spheroid growth conditions, xenograft tumors, and drug tolerant cells isolated from xenograft tumors. What are the genes and programs shared between drug-tolerant cells cultured in the three conditions above? Which genes and programs differ between these conditions? Data from this exercise could help provide additional, useful context with which to understand the benefits and pitfalls of modeling residual tumor cell growth in monolayer culture.

      We thank the reviewer for suggesting valuable future studies. We agree that the proposed experiments represent important next steps in understanding the role of YAP and other pathways in primary resistance. We believe, however, these experiments are both beyond the scope of the current manuscript and beyond what can reasonably be addressed in a revision. The distinct challenges associated with comparing in vivo and in vitro conditions would require significant optimization of single-cell approaches, especially given the robust cell death driven by afatinib treatment in vivo. Given the complexity of in vivo experimentation, we are concerned that such studies may not guarantee biologically meaningful insights. Nonetheless, we agree that this is a compelling direction for future research. If common gene expression patterns could be identified despite these challenges, such studies could help validate monolayer culture as a relevant model for investigating residual disease.

      (2) In relation to the point above, there is an interesting and established connection between mesenchymal gene expression and YAP/TAZ signaling. For example, analyses of gene expression data from human tumors and cell lines demonstrate an extremely strong correlation between these two gene expression programs. Further, residual persister cancer cells have often been characterized as having undergone an EMT-like transition. From the analysis above, is there evidence that residual tumor cells with increased YAP signaling also exhibit increased mesenchymal gene expression?

      We agree with the reviewer that a connection between YAP/TAZ activity and EMT is likely, given prior studies exploring correlations between these two gene signatures. We believe, however, exploring EMT represents a distinct research direction from the primary focus of the current manuscript.  We are concerned exploration of EMT, especially in the absence of corresponding preclinical models or mechanistic data directly linking EMT to therapy resistance in our models, could distract from the main conclusions of the manuscript. While we plan to stain for EMT-associated markers in the residual cancer tissue from the in vivo studies, it remains unclear whether such data would meaningfully contribute to the revised manuscript, regardless of the outcome.

      Reviewer #2 (Public review):

      The manuscript by Nakagawa R, et al describes a mechanism of how NSCLC cells become resistant to EGFR and KRAS G12C inhibition. Here, the authors focus on the initial cellular changes that occur to confer resistance and identify YAP activation as a non-genetic mechanism of acute resistance.

      The authors performed an initial xenograft study to identify YAP nuclear localization as a potential mechanism of resistance to EGFRi. The increase in the stromal component of the tumors upon Afatinib treatment leads the authors to explore the response to these inhibitors in both 2D and 3D culture. The authors extend their findings to both KRAS G12C and BRAF inhibitors, suggesting that the mechanism of resistance may be shared along this pathway.

      The paper would benefit from additional cell lines to determine the generalizability of the findings they presented. While the change in the localization of YAP upon Afatinib treatment was identified in a xenograft model, the authors do not return to animal models to test their potential mechanism, and the effects of the hyperactivated S127A YAP protein on Afatinib sensitivity in culture are modest. Also, combination studies of YAP inhibitors and EGFR/RAS/RAF inhibitors would have strengthened the studies.

      We thank the reviewer for their insightful comments. In this manuscript, we present data from 5 cell lines representing the EGFR/BRAF/KRAS pathway, demonstrating the generalizability of YAP-driven decreased cancer cell sensitivity to targeted inhibitors when cultured in 2D compared to spheroid counterparts. While expanding this analysis to a larger panel of cell lines is beyond the scope of the current study, we believe our findings provide a strong rationale for future investigations, including high-throughput screens conducted by other research groups and pharmaceutical companies, to recognize the value in screening spheroid cell cultures. We hope this work helps shift the field of cancer therapeutics toward screening approaches that better reflect tumor biology into drug discovery pipelines and believe this could be one of the most impactful and enduring contributions of our study.

      Reviewer #2 also mentions that "...combination studies of YAP inhibitors and EGFR/RAS/RAF inhibitors would have strengthened the studies..."  The concept that YAP/TAZ inhibitors (i.e. TEAD inhibitors) could be additive or synergistic in 2D culture is one that is being actively tested across several groups and in pharma. Several recent examples include a publication by Hagenbeek, et al., Nat. Cancer, 2023 (PMID: 37277530) showing that a TEAD inhibitor overcomes KRASG12C inhibitor resistance. Additional, recent work by Pfeifer, et al., Comm. Biol., 2024 (PMID: 38658677) suggests a similar effect between EGFR inhibitors and a different TEAD inhibitor. While neither of these studies extensively probes cell death pathways in the way performed in our studies, they nevertheless provide strong evidence that indeed TEAD + targeted EGFR/RAF/RAS inhibition in 2D have additive, if not synergistic, effects. We feel that these recent published studies affirm our findings and repeating such experiments is unlikely to add much new information. We thus feel they are beyond the scope of our present studies.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      Summary:

      Olfactory sensory neurons (OSNs) in the olfactory epithelium detect myriads of environmental odors that signal essential cues for survival. OSNs are born throughout life and thus represent one of the few neurons that undergo life-long neurogenesis. Until recently, it was assumed that OSN neurogenesis is strictly stochastic with respect to subtype (i.e. the receptor the OSN chooses to express).

      However, a recent study showed that olfactory deprivation via naris occlusion selectively reduced birthrates of only a fraction of OSN subtypes and indicated that these subtypes appear to have a special capacity to undergo changes in birthrates in accordance with the level of olfactory stimulation. These previous findings raised the interesting question of what type of stimulation influences neurogenesis, since naris occlusion does not only reduce the exposure to potentially thousands of odors but also to more generalized mechanical stimuli via preventing airflow.

      In this study, the authors set out to identify the stimuli that are required to promote the neurogenesis of specific OSN subtypes. Specifically, they aim to test the hypothesis that discrete odorants selectively stimulate the same OSN subtypes whose birthrates are affected. This would imply a highly specific mechanism in which exposure to certain odors can "amplify" OSN subtypes responsive to those odors suggesting that OE neurogenesis serves, in part, an adaptive function.

      To address this question, the authors focused on a family of OSN subtypes that had previously been identified to respond to musk-related odors and that exhibit higher transcript levels in the olfactory epithelium of mice exposed to males compared to mice isolated from males. First, the authors confirm via a previously established cell birth dating assay in unilateral naris occluded mice that this increase in transcript levels actually reflects a stimulus-dependent birthrate acceleration of this OSN subtype family. In a series of experiments using the same assay, they show that one specific subtype of this OSN family exhibits increased birthrates in response to juvenile male exposure while a different subtype shows increased birthrates to adult mouse exposure. In the core experiment of the study, they finally exposed naris occluded mice to a discrete odor (muscone) to test if this odor specifically accelerates the birth rates of OSN types that are responsive to this odor. This experiment reveals a complex relationship between birth rate acceleration and odor concentrations showing that some muscone concentrations affect birth rates of some members of this family and do not affect two unrelated OSN subtypes.

      In addition to the results nicely summarized by the reviewer, which focus on experiments to examine the effects of odor stimulation on unilateral naris occluded (UNO) mice, an important part of the present study are experiments on non-occluded (i.e., non-UNO-treated) mice. These experiments show: 1) that the exposure of non-occluded mice to odors from adolescent male mice selectively increases quantities of newborn OSNs of the musk-responsive subtype Olfr235 (Figure 3G, H; previously Figure 6), 2) the exposure of non-occluded female mice to 2 different musk odorants (muscone, ambretone) selectively increases quantities of newborn OSNs of 3 musk responsive subtypes: Olfr235, Olfr1440 and Olfr1431 (Figure 4D-F; previously Figure 6), and 3) the exposure of non-occluded adult female mice to a musk odorants selectively increases quantities of newborn OSNs of musk responsive subtypes (Figure 5; previously Fig. S7). We have reorganized the revised manuscript to more prominently and clearly present the experimental design and findings of these experiments. We have also made changes to clarify (via schematics) the experimental conditions used (i.e., UNO, non-UNO, odor exposure) in each experiment.

      Strengths:

      The scientific question is valid and opens an interesting direction. The previously established cell birth dating assay in naris occluded mice is well performed and accompanied by several control experiments addressing potential other interpretations of the data.

      Weaknesses:

      (1) The main research question of this study was to test if discrete odors specifically accelerate the birth rate of OSN subtypes they stimulate, i.e. does muscone only accelerate the birth rate of OSNs that express muscone-responsive ORs, or vice versa is the birthrate of muscone-responsive OSNs only accelerated by odors they respond to?

      This question is only addressed in Figure 5 of the manuscript and the results only partially support the above claim. The authors test one specific odor (muscone) and find that this odor (only at certain concentrations) accelerates the birth rate of some musk-responsive OSN subtypes, but not two other unrelated control OSN subtypes. This does not at all show that musk-responsive OSN subtypes are only affected by odors that stimulate them and that muscone only affects the birthrate of musk-responsive OSNs, since first, only the odor muscone was tested and second, only two other OSN subtypes were tested as controls, that, importantly, are shown to be generally stimulus-independent OSN subtypes (see Figure 2 and S2).

      As a minimum the authors should have a) tested if additional odors that do not activate the three musk-responsive subtypes affect their birthrate b) choose 2-3 additional control subtypes that are known to be stimulus-dependent (from their own 2020 study) and test if muscone affects their birthrates.

      We appreciate these suggestions. Within the revised manuscript, we have described and included the results from several new experiments:

      (1) As noted by the reviewer, we had previously tested the effects of exposure to only one exogenous musk odorant, muscone, on quantities of newborn OSNs of the musk-responsive subtypes Olfr235, Olfr1440, and Olfr1431. To test whether the effects observed with muscone exposure occur with other musk odorants, we assessed the effects of exposure to ambretone (5-cyclohexadecenone), a musk odorant previously found to robustly activate musk-responsive OSNs (Sato-Akuhara et al., 2016; Shirasu et al., 2014), on quantities of newborn OSNs of 3 musk-responsive subtypes Olfr235, Olfr1440, and Olfr1431, as well as the SBT-responsive subtype Olfr912, in the OEs of non-occluded female mice. Exposure to ambretone was found to significantly increase quantities of newborn OSNs of all 3 musk-responsive subtypes (Figure 4D-F) but not the SBT-responsive subtype (Figure 4–figure supplement 4C-left), indicating that a variety of musk odorants can accelerate the birthrates of musk responsive subtypes.

      (2) To verify that exogenous non-musk odors do not increase quantities of newborn OSNs of musk responsive OSN subtypes (point a, above), we quantified newborn OSNs of 3 musk-responsive subtypes, Olfr235, Olfr1440, and Olfr1431, in non-occluded female mice that were exposed to the non-musk odorants SBT or IAA. As expected, neither of these odorants significantly affected the birthrates of the subtypes tested (Figure 4D-F).

      (3) To confirm that exogenous musk odors do not accelerate the birthrates of non-musk responsive OSN subtypes that were previously found to undergo stimulation-dependent neurogenesis (point b, above), we quantified newborn OSNs of 2 such subtypes, Olfr827 and Olfr1325, in non-occluded female mice that were exposed to muscone. As expected, exposure to muscone did not significantly affect the birthrates of either of these subtypes (Figure 4–figure supplement 4C-middle, right).

      (4) To provide additional confirmation that only some OSN subtypes have a capacity to exhibit increases in newborn OSN quantities in the presence of odors that activate them, we compared quantities of newborn OSNs of the SBT-responsive subtype Olfr912 in non-occluded females that were either exposed to 0.1% SBT versus unexposed controls. As expected, exposure of SBT caused no significant increase in quantities of newborn Olfr912 OSNs (Figure 4–figure supplement 4C-left).

      (2) The finding that Olfr1440 expressing OSNs do not show any increase in UNO effect size under any muscone concentration (Figure 5D, no significance in line graph for UNO effect sizes, middle) seems to contradict the main claim of this study that certain odors specifically increase birthrates of OSN subtypes they stimulate. It was shown in several studies that olfr1440 is seemingly the most sensitive OR for muscone, yet, in this study, muscone does not further increase birthrates of OSNs expressing olfr1440. The effect size on birthrate under muscone exposure is the same as without muscone exposure (0%).

      In contrast, the supposedly second most sensitive muscone-responsive OR olfr235 shows a significant increase in UNO effect size between no muscone exposure (0%) and 0.1% as well as 1% muscone.

      Findings that quantities of newborn Olfr1440 OSNs do not show a significantly greater UNO effect size in the OEs from mice exposed to muscone compared to control mice was also somewhat surprising to us. We think that there are two potential explanations for this result: 1) Unlike subtype Olfr235, subtype Olfr1440 exhibits a significant open-side bias in newborn OSN quantities in UNO-treated adolescent females even in the absence of exposure to muscone. We speculate that this subtype (as well as subtype Olfr1431) is stimulated by odors that are emitted by female mice at the adolescent stage, and/or by another environmental source. This may limit the influence of muscone exposure on the UNO effect size. 2) There is compelling evidence that odors within the environment can enter the closed side of the OE transnasally [via the nasopharyngeal canal (Kelemen, 1947)] and/or retronasally (via the nasopharynx) in UNO-treated mice [reviewed in (Coppola, 2012)]. Thus, it is conceivable that chronic exposure of UNO-treated mice to muscone results in the eventual entry on the closed side of the OE of muscone at concentrations sufficient to promote neurogenesis. If Olfr1440 is more sensitive to muscone than Olfr235 [e.g., (Sato-Akuhara et al., 2016; Shirasu et al., 2014)], OSNs of this subtype may be especially sensitive to small amounts of odors that enter the closed side of the OE transnasally and/or retronasally. These explanations are supported by the following results:

      - UNO-treated females exposed to 0.1% muscone show higher quantities of newborn Olfr1440 OSNs on both the open and closed sides of the OE in muscone exposed females compared to their unexposed counterparts (Figure 4–figure supplement 1A-middle). Similar results were also observed for newborn Olfr235 OSNs (Figure 4C-middle), albeit to a lesser extent, perhaps due to the lower sensitivity of this subtype to muscone.

      - In non-occluded female mice, exposure to 0.1% muscone was found to significantly increase quantities of newborn Olfr1440 OSNs, as well as newborn Olfr235 and Olfr1431 OSNs (Figure 4D-F in revised manuscript; Figure 6 in original version). Similar results were also observed upon exposure to ambretone, another musk odor (Figure 4D-F). These experiments strongly support the hypothesis that musk odors selectively increase birthrates of OSN subtypes that they stimulate.

      We have addressed these points within the results section of the revised manuscript.

      (3) The authors introduce their choice to study this particular family of OSN subtypes with first, the previous finding that transcripts for one of these musk-responsive subtypes (olfr235) are downregulated in mice that are deprived of male odors. Second, musk-related odors are found in the urine of different species. This gives the misleading impression that it is known that musk-related odors are indeed excreted into male mouse urine at certain concentrations. This should be stated more clearly in the introduction (or cited, if indeed data exist that show musk-related odors in male mouse urine) because this would be a very important point from an ethological and mechanistic point of view.

      In addition, this would also be important information to assess if the chosen muscone concentrations fall at all into the natural range.

      These are important points, which have addressed within the revised manuscript:

      (1) Within the introduction, we have now stated that the emission of musk odors by mice has not been documented. We have also added extensive discussions of what is known about the emission of musk odors by mice in a new subsection within Results, as well as within the Discussion section. Most prominently, we have cited one study (Sato-Akuhara et al., 2016) that noted unpublished evidence for the emission of Olfr1440-activating compounds from male preputial glands: “Indeed, our preliminary experiments suggest that there are unidentified compounds that activate MOR215-1 in mouse preputial gland extracts.” Another study, which used histomorphology, metabolomic and transcriptomic analyses to compare the mouse preputial glands to muskrat scent glands, found that the two glands are similar in many ways, including molecular composition (Han et al., 2022). However, the study did not identify known musk compounds within mouse preputial glands.

      (2) Based on the reviewer’s feedback and our own curiosity, we used GC-MS to analyze both mouse urine and preputial gland extracts for the presence of known musk odorants, particularly those known to activate Olfr235 and Olfr1440 (Sato-Akuhara et al., 2016). Although we were unable to find evidence for known musk odorants in mouse urine extracts (possibly due to insufficient sensitivity of the assay employed), we found that preputial gland extracts contain GC-MS signals that are structurally consistent with known musk odorants. A limitation of this approach, however, is that the conclusive identification of specific musk odorants in extracts derived from mouse urine and tissues requires comparisons to pure standards, many of which we could not readily obtain. For example, we were unable to obtain a pure sample of cycloheptadecanol, a musk molecule with a predicted potential match to a signal identified within preputial gland extracts. Another limitation is that although several known musk odorants have been found to activate Olfr235 and Olfr1440 OSNs, it is conceivable that structurally distinct odorants that have not yet been identified might also activate them. The findings from these experiments have been included in a new figure within the revised manuscript (Appendix 2–figure 1).

      Related: If these are male-specific cues, it is interesting that changes in OR transcripts (Figure 1) can already be seen at the age of P28 where other male-specific cues are just starting to get expressed. This should be discussed.

      We agree that the observed changes in quantities of newborn OSNs of musk-responsive subtypes in mice exposed to juvenile male odors deserves additional discussion. We have included a more extensive discussion of this observation in both the Results and Discussion sections of the revised manuscript.

      (4) Figure 5: Under muscone exposure the number of newborn neurons on the closed sides fluctuates considerably. This doesn't seem to be the case in other experiments and raises some concerns about how reliable the naris occlusion works for strong exposure to monomolecular odors or what other potential mechanisms are at play.

      We agree that the variability in quantities of newborn OSNs of musk-responsive subtypes on the closed side of the OE of UNO-treated mice deserves further discussion. As noted above, we suspect that these fluctuations are due, at least in part, to transnasal and/or retronasal odor transfer via the nasopharyngeal canal (Kelemen, 1947) and nasopharynx, respectively [reviewed in (Coppola, 2012)], which would be expected to result in exposure of the closed OE to odor concentrations that rise with increasing environmental concentrations. In support of this, quantities of newborn Olfr235 and Olfr1440 OSNs increase on both the open and closed sides with increasing muscone concentration (except at the highest concentration, 10%, in the case of Olfr1440) (Figure 4C-middle, Figure 4–figure supplement 1A-middle). It is conceivable that reductions in newborn Olfr1440 OSN quantities observed in the presence of 10% muscone reflect overstimulation-dependent reductions in survival. Our findings from UNO-based experiments are consistent with expectations that naris occlusion does not completely block exposure to odorants on the closed side, particularly at high concentrations. However, they also appear consistent with the hypothesis that exposure to musk odors promotes the neurogenesis of musk-responsive OSN subtypes.

      Considering the limitations of the UNO procedure, it is important to note that the present study also includes experimental exposure of non-occluded animals to both male odors (Figure 3G, H) and exogenous musk odorants (Figures 4D-F). Findings from the latter experiments provide strong evidence that exposure to multiple musk odorants (muscone, ambretone) causes selective increases in the birthrates of multiple musk-responsive OSN subtypes (Olfr235, Olfr1440, Olfr1431).

      We have included within the Results section of the revised manuscript a discussion of how observed effects of muscone exposure of UNO-treated mice may be influenced by transnasal/ retronasal odor transfer to the closed side of the OE.

      (5) In contrast to all other musk-responsive OSN types, the number of newborn OSNs expressing olfr1437 increases on the closed side of the OE relative to the open in UNO-treated male mice (Figure 1). This seems to contradict the presented theory and also does not align with the bulk RNAseq data (Figure S1).

      Subtype Olfr1437 is indeed an outlier among musk-responsive subtypes that were previously found to be more highly represented in the OSN population in 6-month-old sex-separated males compared to females (Appendix 1–figure 1)(C. van der Linden et al., 2018; Vihani et al., 2020). Somewhat unexpectedly, our findings from scRNA-seq experiments show slightly greater quantities of immature Olfr1437 OSNs on the closed side of the OE in juvenile males (Figure 1D, E of the revised manuscript, which now includes data from a second OE). Perhaps more informatively considering the small number of iOSNs of specific subtypes in the scRNA-seq datasets, EdU birthdating experiments show no difference in newborn Orlfr1437 OSN quantities on the 2 sides of the OE from UNO-treated juvenile males (Figure 2G). It is unclear to us why subtype Olfr1437 does not show open-side biases in newborn OSN quantities in juvenile male mice, but potential explanations include:

      - Age: Findings based on bulk RNA-seq that musk responsive OSN subtypes are more highly represented in mice exposed to male odors analyzed mice that were 6 months old (C. van der Linden et al., 2018) or > 9 months old (Vihani et al., 2020) at the time of analysis. By contrast, the present study primarily analyzed mice that were juveniles (PD 28) at the time of scRNA-seq analysis (Figure 1) or EdU labeling (Figure 2G). It is conceivable that different musk-responsive subtypes are selectively responsive to distinct odors that are emitted at different ages. In this scenario, odors that increase the birthrates of Olfr235, Olfr1440, and Olfr1431 OSNs may be emitted starting at the juvenile stage, while those that increase the birthrate of Olfr1437 OSNs may be emitted in adulthood. In potential support of this, juvenile males exposed to their adult parents at the time of EdU labeling showed a slightly greater (although not statistically significantly different) UNO effect size in quantities of newborn Olfr1437 OSNs compared to controls (Figure 3–figure supplement 3).

      - Capacity for stimulation-dependent neurogenesis: It is also conceivable that, unlike other musk-responsive OSN subtypes, Olfr1437 OSNs lack the capacity for stimulation-dependent neurogenesis (like the SBT-responsive subtype Olfr912, for example). If so, this would imply that increased representations of Olfr1437 OSNs observed in mice exposed to male odors for long periods (C. van der Linden et al., 2018; Vihani et al., 2020) may be due to male odor-dependent increases in the lifespans of Olfr1437 OSNs.

      Within the Discussion section of the revised manuscript, we have discussed the findings concerning Olfr1437.

      (6) The authors hypothesize in relation to the accelerated birthrate of musk-responsive OSN subtypes that "the acceleration of the birthrates of specific OSN subtypes could selectively enhance sensitivity to odors detected by those subtypes by increasing their representation within the OE". However, for two other OSN subtypes that detect male-specific odors, they hypothesize the opposite "By contrast, Olfr912 (Or8b48) and Olfr1295 (Or4k45), which detect the male-specific non-musk odors 2-sec-butyl-4,5-dihydrothiazole (SBT) and (methylthio)methanethiol (MTMT), respectively, exhibited lower representation and/or transcript levels in mice exposed to male odors, possibly reflecting reduced survival due to overstimulation."

      Without any further explanation, it is hard to comprehend why exposure to male-derived odors should, on one hand, accelerate birthrates in some OSN subtypes to potentially increase sensitivity to male odors, but on the other hand, lower transcript levels and does not accelerate birth rates of other OSN subtypes due to overstimulation.

      We agree that this point deserves further explanation. Within the revised manuscript, we have expanded the Introduction and Results to describe evidence from previous studies that exposure to stimulating odors causes two categories of changes to specific OSN subtypes: elevated representations or reduced representations within the OSN population. In one study (C. J. van der Linden et al., 2020), UNO treatment was found to cause a fraction of OSN subtypes to exhibit lower birthrates and representations on the closed side of the OE relative to the open. By contrast, another fraction of OSN subtypes exhibited higher representations on the closed side of the OEs of UNO-treated mice, but no difference in birthrates between the two sides. The latter subtypes were found to be distinguished by their receipt of extremely high levels of odor stimulation, suggesting that reduced odor stimulation via naris occlusion may lengthen their lifespans. In support of the possibility that Olfr912 (and Olfr1295), which detect SBT and MTMT, respectively (Vihani et al., 2020), which are emitted specifically by male mice (Lin et al., 2005; Schwende et al., 1986), UNO treatment was previously found to increase total Olfr912 OSN quantities on the closed side compared to the open side in sex-separated males (C. van der Linden et al., 2018), a finding confirmed in the present study (Figure 3–figure supplement 1H).

      Taken together, findings from previous studies as well as the current one indicate that olfactory stimulation can accelerate the birthrates and/or reduced the lifespans of OSNs, depending on the specific subtypes and odors within the environment. As we have now indicated in the Discussion, we do not yet know what distinguishes subtypes that undergo stimulation-dependent neurogenesis, but it is conceivable that they detect odors with a particular salience to mice. Thus, observations that some odorants (e.g., musks) cause stimulation-dependent neurogenesis while others do not (e.g., SBT) might reflect an animal’s specific need to adapt its sensitivity to the former. Alternatively, it is conceivable that stimulation-dependent reductions in representations of subtypes such as Olfr912 and Olfr1295 reflect a fundamentally different mode of plasticity that is also adaptive, as has been hypothesized (C. van der Linden et al., 2018; Vihani et al., 2020).

      Reviewer #1 (Recommendations For The Authors):

      To support the main claim, several controls are necessary as mentioned under point 1 of the public review.

      As outlined in our responses to the public review, new experiments within the revised manuscript indicate the following:

      (1) Accelerated birthrates of 3 different musk responsive OSN subtypes (Olfr235, Olfr1440, Olfr1431) are observed in non-occluded mice following exposure to multiple exogenous musk odorants (muscone, ambretone) (Figure 4D-F).

      (2) Exposure of non-occluded mice to non-musk odors (SBT, IAA) does not accelerate the birthrates of musk responsive OSN subtypes (Olfr235, Olfr1440, Olfr1431) (Figure 4D-F).

      (3) Exposure of mice to exogenous musk odors (muscone, ambretone) does not accelerate the birthrates of non-musk responsive OSN subtypes (e.g., Olfr912), including those previously found to undergo stimulation-dependent neurogenesis (Olfr827, Olfr1325) (Figure 4–figure supplement 4C).

      (4) Only a fraction of OSN subtypes have a capacity to undergo accelerated neurogenesis in the presence of odors that activate them (e.g., Olfr912 birthrates are not accelerated by SBT exposure) (Figure 4–figure supplement 4C-left).

      In addition, this study could be considerably improved by showing that the proposed mechanism applies beyond a single OSN subtype (olfr235), especially since the most sensitive OR subtype (expressing olfr1440) does not align with the main claim. The introduction states that this is difficult because the ligands for many ORs are unknown including all subtypes previously found to undergo stimulation-dependent neurogenesis referring to your 2020 study. While this reviewer agrees that the lack of deorphanization is a significant hurdle in the field, the 2020 study states that about 4% of all ORs (which should equal >40 ORs) show a stimulus-dependent down-regulation on the closed side, not only the 7 ORs which are closer examined (Figure 1). It would tremendously improve the impact of the current study to show that the proposed effect applies also to one of these other >40 ORs.

      We appreciate this question, as it alerted us to some shortcomings in how our findings were presented within the original manuscript. We respectfully disagree that only findings regarding subtype Olfr235 align with the main hypothesis of this study, which is that discrete odors can selectively promote the neurogenesis of sensory neuron subtypes that they stimulate. Specifically, we would like to draw attention to experiments on non-occluded female mice exposed to exogenous musk odorants (muscone, ambretone; revised Figures 4D-F; previously, Figure 6). Findings from these experiments provide compelling evidence that exposure to musk odorants causes selective increases in the birthrates of three different musk-responsive OSN subtypes: Olfr235, Olfr1440, and Olfr1431. Thus, we would suggest that results from the present study already show that the proposed mechanism applies to more than the just Olfr235 subtype. However, we agree with what we think is the essence of the reviewer’s point: that it is important to determine the extent to which this mechanism applies to OSN subtypes that are responsive to other (i.e., non-musk) odorants. While, as noted by the reviewer, our previous study identified several OSN subtypes that undergo stimulation-dependent neurogenesis (as well as many others that predicted to do so)(C. J. van der Linden et al., 2020), we are not aware of ligands that have been identified with high confidence for those subtypes. Although we are in the process of conducting experiments to identify additional odor/subtype pairs to which the mechanism described in this study applies, the early-stage nature of these experiments precludes their inclusion in the present manuscript.

      The ethological and mechanistic relevance of the current study could be significantly improved by showing that musk-related odors that activate olfr235 are actually found in male mouse urine (and additionally are not found in female mouse urine). Otherwise, the implicated link between the acceleration of OSN birthrates by exposure to male odors and acceleration by specific monomolecular odors does not hold, raising the question of any natural relevance (e.g. the proposed adaptive function to increase sensitivity to certain odors).

      As noted in our responses to the public review, we have addressed this important point within the revised manuscript as follows:

      (1) We have included an extensive discussion of what is known about the emission of musk-like odors by mice.

      (2) We have used GC-MS to analyze both mouse urine and preputial gland extracts for the presence of known musk compounds. Although inconclusive, we report that preputial glands contain signals that are structurally consistent with known musk compounds. The findings of these experiments have been included in the revised manuscript (new Appendix 2–figure 1), along with a discussion of their limitations.

      Reviewer #2 (Public Review):

      In their paper entitled "In mice, discrete odors can selectively promote the neurogenesis of sensory neuron subtypes that they stimulate" Hossain et al. address lifelong neurogenesis in the mouse main olfactory epithelium. The authors hypothesize that specific odorants act as neurogenic stimuli that selectively promote biased OR gene choice (and thus olfactory sensory neuron (OSN) identity). Hossain et al. employ RNA-seq and scRNA-seq analyses for subtype-specific OSN birthdating. The authors find that exposure to male and musk odors accelerates the birthrates of the respective responsive OSNs. Therefore, Hossain et al. suggest that odor experience promotes selective neurogenesis and, accordingly, OSN neurogenesis may act as a mechanism for long-term olfactory adaptation.

      We appreciate this summary but would like to underscore that a mechanism involving biased OR gene choice is just one of two possibilities proposed in the Discussion section to explain how odorant stimulation of specific subtypes accelerates the birthrates of those subtypes.

      The authors follow a clear experimental logic, based on sensory deprivation by unilateral naris occlusion, EdU labeling of newborn neurons, and histological analysis via OR-specific RNA-FISH. The results reveal robust effects of deprivation on newborn OSN identity. However, the major weakness of the approach is that the results could, in (possibly large) parts, depend on "downregulation" of OR subtype-specific neurogenesis, rather than (only) "upregulation" based on odor exposure. While, in Figure 6, the authors show that the observed effects are, in part, mediated by odor stimulation, it remains unclear whether deprivation plays an "active" role as well. Moreover, as shown in Figure 1C, unilateral naris occlusion has both positive and negative effects in a random subtype sample.

      In our view, the present study involves two distinct and complementary experimental designs: 1) odor exposure of UNO-treated animals and 2) odor exposure of non-occluded animals. Here we address this comment with respect to each of these designs:

      (1) For experiments performed on UNO-treated animals, we agree that observed differences in birthrates on the open and closed sides of the OE reflect, largely, a deceleration (i.e., downregulation) of the birthrates of these subtypes on the closed side relative to the open (as opposed to an acceleration of birthrates on the open side). Our objective in using this design was to test the extent to which specific OSN subtypes undergo stimulation-dependent neurogenesis under various odor exposure conditions. According to the main hypothesis of this study, a lower birthrate of a specific OSN subtype on the closed side of the OE compared to the open is predicted to reflect a lower level of odor stimulation on the closed side received by OSNs of that subtype. However (and as described in our responses to reviewer #1), a limitation of this design is that environmental odorants, especially at high concentrations, are likely to stimulate responsive OSNs on the closed side of the OE in addition to the open side due to transnasal and/or retronasal air flow.

      (2) Experiments performed on non-occluded animals were designed to provide critical complementary evidence that specific OSN subtypes undergo accelerated neurogenesis in the presence of specific odors. Using this design, we have found compelling evidence that:

      - Exposure of non-occluded mice to male odors causes the selective acceleration of the birthrate of Olfr235 OSNs (Figure 3G, H).

      - Exposure of non-occluded female mice to two different musk odorants (muscone and ambretone) selectively accelerates the birthrates three different musk responsive subtypes: Olfr235, Olfr1440, and Olf1431 (Figure 4D-F and Figure 4–figure supplement 4C).

      We have reorganized the revised manuscript to more clearly present the most important experimental findings using these two experimental designs. We have also highlighted (via schematics) the experimental conditions (e.g., UNO, non-occlusion, odor exposure) used for each experiment.

      Another weakness is that the authors build their model (Figure 8), specifically the concept of selectivity, on a receptor-ligand pair (Olfr912 that has been shown to respond, among other odors, to the male-specific non-musk odors 2-sec-butyl-4,5-dihydrothiazole (SBT)) that would require at least some independent experimental corroboration. At least, a control experiment that uses SBT instead of muscone exposure should be performed.

      We agree that this important concern deserves additional control experiments and discussion. We have addressed this concern within the revised manuscript as follows:

      - Within the Results section, we have added multiple new control experiments (detailed in response to Reviewer #1), including the one recommended above. As suggested, we quantified newborn OSNs of the SBT-responsive subtype Olfr912 in non-occluded females that were either exposed to 0.1% SBT or unexposed controls. Exposure of SBT was found to cause no significant increase in quantities of newborn Olfr912 OSNs (newly added Figure 4–figure supplement 4C-left). These findings further support the model in Figure 7 (previously Figure 8) that only a fraction of OSN subtypes have a capacity to undergo accelerated neurogenesis in the presence of odors that activate them.

      - Also within the Results section, we have made efforts to better highlight relevant control experiments that were included in the original version, particularly those showing that quantities of newborn Olfr912 OSNs are not affected by UNO in mice exposed to male odors (Figure 2H and Figure 3–figure supplement 1G; previously Figure 2F and Figure 3H) or by exposure of non-occluded females to male odors (Figure 3H; previously Figure 6E). Since Olfr235 is responsive to component(s) of male odors (C. van der Linden et al., 2018; Vihani et al., 2020), these results indicate that this subtype does not have the capacity of stimulation-dependent neurogenesis, which is consistent with our previous findings that only a fraction of subtypes have this capacity (C. J. van der Linden et al., 2020).

      In this context, it is somewhat concerning that some results, which appear counterintuitive (e.g., lower representation and/or transcript levels of Olfr912 and Olfr1295 in mice exposed to male odors) are brushed off as "reflecting reduced survival due to overstimulation." The notion of "reduced survival" could be tested by, for example, a caspase3 assay.

      This is a point that we agree deserves further discussion. Please see the explanation that we have outlined above in response to Reviewer #1.

      Within the revised manuscript, we have expanded the Introduction to describe evidence from previous studies that exposure to stimulating odors causes two categories of changes to specific OSN subtypes: elevated representations or reduced representations within the OSN population. We outline evidence from previous studies that Olfr912 and Olfr1295 belong to the latter category, and that the representations of these subtypes are likely reduced by male odor overstimulation-dependent shortening of OSN lifespan.

      Important analyses that need to be done to better be able to interpret the findings are to present (i) the OR+/EdU+ population of olfactory sensory neurons not just as a count per hemisection, but rather as the ratio of OR+/EdU+ cells among all EdU+ cells; and (ii) to the ratio of EdU+ cells among all nuclei (UNO versus open naris). This way, data would be normalized to (i) the overall rate of neurogenesis and (ii) any broad deprivation-dependent epithelial degeneration.

      We have addressed this concern in two ways within the revised manuscript:

      (1) We have noted within the Methods section that the approach of using half-sections for normalization has been used in multiple previous studies for quantifying newborn (OR+/EdU+) and total (OR+) OSN abundances (Hossain et al., 2023; Ibarra-Soria et al., 2017; C. van der Linden et al., 2018; C. J. van der Linden et al., 2020). Additionally, within the figure legends and Methods, we have more thoroughly described the approach used, including that it relies on averaging the quantifications from at least 5 high-quality coronal OE tissue sections that are evenly distributed throughout the anterior-posterior length of each OE and thereby mitigates the effects of section size and cell number variation among sections. In the case of UNO treated mice, the open and closed sides within the same section are paired, which further reduces the effects of section-to section variation. We have found that this approach yields reproducible quantities of newborn and total OSNs among biological replicate mice and enables accurate assessment of how quantities of OSNs of specific subtypes change as a result of altered olfactory experience, a key objective of this study.

      (2) To assess whether the use of alternative approaches for normalizing newborn OSN quantities suggested by the reviewers would affect the present study’s findings, we compared three methods for normalizing the effects of exposure to male odors or muscone on quantities of newborn Olfr235 OSNs in the OEs of both UNO-treated and non-occluded mice: 1) OR+/EdU+ OSNs per half-section (used in this study), 2) OR+/EdU+ OSNs per total number of EdU+ cells (reviewer suggestion (i)), and 3) OR+/EdU+ OSNs per unit of DAPI+ area (an approximate measure of nuclei number; reviewer suggestion (ii)). The three normalization methods yielded statistically indistinguishable differences in assessing the effects of exposure of either UNO-treated or non-occluded mice to male odors (newly added Figure 2–figure supplement 2 and Figure 3–figure supplement 2), or of exposure of non-occluded mice to muscone (newly added Figure 4–figure supplement 3). Based on these findings, and the considerable time that would be required to renormalize all data in the manuscript, we have chosen to maintain the use of normalization per half-section.

      Finally, the paper will benefit from improved data presentation and adequate statistical testing. Images in Figures 2 - 7, showing both EdU labeling of newborn neurons and OR-specific RNA-FISH, are hard to interpret. Moreover, t-tests should not be employed when data is not normally distributed (as is the case for most of their samples).

      We have made extensive changes within the revised manuscript to increase the clarity and interpretability of the figures, including:

      (1) Addition of a split-channel, high-magnification view of a representative image that shows the overlap of FISH and EdU signals (Figure 2D).

      (2) Addition of experimental schematics and timelines corresponding to each set of experiments.

      In the revised manuscript, several changes to the statistical tests have been made, as follows:

      (1) To assess deviation from normality of the histological quantifications of newborn and total OSNs of specific subtypes in this study, all datasets were tested using the Shapiro-Wilk test for non-normality and the P values obtained are included in Supplementary file 1 (figure source data). Of the 274 datasets tested, 253 were found to have Shapiro-Wilk P values > 0.05, indicating that the vast majority (92%) do not show evidence of significant deviation from a normal distribution.

      (2) A general lack of deviation of the datasets in this study from a normal distribution is further supported by quantile-quantile (QQ) plots, which compare actual data to a theoretically normal distribution (Appendix 4–figure 1). The datasets analyzed were separated into the following categories:

      a. Quantities of newborn OSNs in UNO treated mice (Appendix 4-figure 1A)

      b. Quantities of total OSNs in UNO treated mice (Appendix 4-figure 1B)

      c. Quantities of newborn OSNs in non-occluded mice (Appendix 4-figure 1C)

      d. UNO effect sizes for newborn or total OSNs (Appendix 4-figure 1D)

      (3) Results of both parametric and non-parametric statistical tests of comparisons in this study have been included in Supplementary file 2 (statistical analyses). In general, the results from parametric and non-parametric tests are in good agreement.

      (4) Statistical analyses of differences in OSN quantities in the OEs of non-occluded mice or UNO effect sizes in UNO-treated mice subjected more than two different experimental conditions have now been performed using one-way ANOVA tests, FDR-adjusted using the 2-stage linear step-up procedure of Benjamini, Krieger and Yekutieli.

      Reviewer #2 (Recommendations for the Authors):

      The manuscript by Hossain et al. would benefit from a thorough revision. Here, we outline several points that should be addressed:

      Figure 3E - I & Figure 4E&F: Red lines that connect mean values are misleading.

      Within the revised manuscript, the UNO effect size graphs have been modified for clarity, including removal of the lines between mean values except for those comparing changes over time post EdU injection (Figure 6 and Figure 6-figure supplement 1). For these latter graphs, we think that lines help to illustrate changes in effect sizes over time.

      Figure 3E - I: UNO effect sizes (right) should be tested via ANOVA.

      In the revised manuscript, statistical analyses of UNO effect sizes in UNO-treated mice subjected more than two different experimental conditions were done using one-way ANOVA tests, FDR-adjusted using the 2-stage linear step-up procedure of Benjamini, Krieger and Yekutieli (Figure 2-figure supplement 2; Figure 3; Figure 3-figure supplement 1; Figure 4; Figure 4-figure supplements 1, 2). The same tests were used for analysis of differences in OSN quantities in the OEs of non-occluded mice subjected more than two different experimental conditions (Figure 3; Figure 3-figure supplement 2; Figure 4; Figure 4-figure supplements 3, 4). For comparisons of differences in quantities of newborn OSNs of musk-responsive subtypes at 4 and 7 days post-EdU between non-occluded mice exposed and unexposed to muscone, a two sample ANOVA - fixed-test, using F distribution (right-tailed) was used (Figure 6; Figure 6-figure supplement 1).

      Images in Figures 2 - 7, showing both EdU labeling of newborn neurons and OR-specific RNA-FISH: Colabeling is hard / often impossible to discern. Show zoom-ins and better explain the criteria for "colabeling" in the methods.

      In the revised manuscript an enlarged and split-channel view of an image showing multiple newborn Olfr235 OSNs (OR+/EdU+) has been added (Figure 2D). A detailed description of the criteria for OR+/EdU+ OSNs is provided in Methods under the section “Histological quantification of newborn and total OSNs of specific subtypes.”

      Figure 1C: add Olfr912.

      As a control group for iOSN quantities of musk-responsive subtypes in Figure 1, we selected random subtypes that are expressed in the same zones: 2 and 3. Olfr912 OSNs were not included because this subtype was not randomly chosen, nor is it expressed the same zones (Olfr912 is expressed in zone 4). We also note that the scRNA-seq analysis was done to allow an initial exploration of the hypothesis that some OSN subtypes with that are more highly represented in mice exposed to male odors show stimulation-dependent neurogenesis. Considering that the scRNA-seq datasets contain only small numbers of iOSNs of specific subtypes, we think they are more useful for analyzing changes in birthrates within groups of subtypes (e.g., musk responsive, random) rather than individual subtypes.

      The time of OE dissection is different for data shown in Figure 1 (P28) as compared to other figures (P35). Please comment/discuss.

      Within the Results section of the revised manuscript, we have now clarified that the PD 28 timepoint chosen for EdU birthdating in the histological quantification of newborn OSNs of specific subtypes is analogous to the PD 28 timepoint chosen for identification of immature (Gap43-expressing) OSNs in the scRNA-seq samples. In the case of EdU birthdating, it is necessary to provide a chase period of sufficient length to enable robust and stable expression of an OR, which defines the subtype. A chase period of 7 days was chosen based on a previous study (C. J. van der Linden et al., 2020). Hence, a dissection date of PD 35 was chosen.

      Figure 3F&G: please discuss the female à female effects

      In the Results and Discussion sections of the revised manuscript, we discuss our observation that the Olfr1440 and Olfr1431 subtypes show significantly higher quantities of newborn OSNs on the open side compared to closed sides in UNO-treated females. We speculate that these subtypes may receive some odor stimulation in juvenile females, perhaps via musk or related odors emitted by females themselves or from elsewhere within the environment.

      Figure 4E (and other examples): male à male displays two populations (no effect versus effect); please explain/speculate.

      For some UNO effect sizes, there appears to be high degree of variation among mice, and, in some cases, this diversity appears to cause the data to separate into groups. We assessed whether this diversity might reflect mice that came from different litters, but this is not the case. Rather, we speculate that the observed diversity most likely reflects low representations of newborn OSNs of some subtypes and/or under specific conditions. The data referred to by the reviewer (now Figure 3–figure supplement 3D), for example, shows UNO effect sizes for quantities of newborn Olfr1431 OSNs, which has the lowest representation among the musk-responsive subtypes analyzed in this study.

      Figure 5C-E: It is unclear why strong muscone concentrations (10%) have no effect, whereas no muscone sometimes (D&E) has an effect.

      As discussed in response to comments from Reviewer #1, we speculate that fluctuations in UNO effect sizes in muscone-exposed mice, particularly at high muscone concentrations, may be due, at least in part, to transnasal and/or retronasal air flow [reviewed in (Coppola, 2012)], which would be expected to result in exposure of the closed side of the OE to muscone concentrations that increase with increasing environmental concentrations. In support of this, quantities of newborn Olfr235 (Figure 4C-middle) and Olfr1440 (Figure 4–figure supplement 1A-middle) OSNs increase on both the open and closed sides with increasing muscone concentration (except at the highest concentration, 10%, in the case of Olfr1440). We speculate that reductions in newborn Olfr1440 OSN quantities observed in the presence of 10% muscone may reflect overstimulation-dependent reductions in survival.

      As emphasized above, our study also includes experiments on non-occluded animals (Figures 3, 4, 5). Findings from these experiments provide additional evidence that exposure to multiple musk odorants (muscone, ambretone) causes selective increases in the birthrates of multiple musk-responsive OSN subtypes (Olfr235, Olfr1440, Olfr1431).

      We have included an extensive interpretation of UNO-based experiments, including their limitations, within the Results section of the revised manuscript.

      Figure S1: please explain the large error bars regarding "Transcript level".

      We have clarified that the error bars in this figure, which is now Appendix 1–figure 1, correspond to 95% confidence intervals.

      The figure captions could be improved for ease of reading.

      Figure captions have been revised for increased clarity.

      Figure 4: Include Olfr235 data for consistency.

      All OSN subtypes analyzed for the effects of exposure to adult mice on UNO-induced open-side biases in quantities of newborn OSNs have been included in a single figure, which is now Figure 3–figure supplement 3.

      Figure S6F&G: Do not run statistics on n = 2 (G) or 3 (F) samples.

      We have removed statistical test results from comparisons involving fewer than 4 observations.

      Reviewer #3 (Public Review):

      Summary:

      Neurogenesis in the mammalian olfactory epithelium persists throughout the life of the animal. The process replaces damaged or dying olfactory sensory neurons. It has been tacitly that replacement of the OR subtypes is stochastic, although anecdotal evidence has suggested that this may not be the case. In this study, Santoro and colleagues systematically test this hypothesis by answering three questions: is there enrichment of specific OR subtypes associated with neurogenesis? Is the enrichment dependent on sensory stimulus? Is the enrichment the result of differential generation of the OR type or from differential cell death regulated by neural activity? The authors provide some solid evidence indicating that musk odor stimulus selectively promotes the OR types expressing the musk receptors. The evidence argues against a random selection of ORs in the regenerating neurons.

      Strengths:

      The strength of the study is a thorough and systematic investigation of the expression of multiple musk receptors with unilateral naris occlusion or under different stimulus conditions. The controls are properly performed. This study is the first to formulate the selective promotion hypothesis and the first systematic investigation to test it. The bulk of the study uses in situ hybridization and immunofluorescent staining to estimate the number of OR types. These results convincingly demonstrate the increased expression of musk receptors in response to male odor or muscone stimulation.

      Weaknesses:

      A major weakness of the current study is the single-cell RNASeq result. The authors use this piece of data as a broad survey of receptor expression in response to unilateral nasal occlusion. However, several issues with this data raise serious concerns about the quality of the experiment and the conclusions. First, the proportion of OSNs, including both the immature and mature types, constitutes only a small fraction of the total cells. In previous studies of the OSNs using the scRNASeq approach, OSNs constitute the largest cell population. It is curious why this is the case. Second, the authors did not annotate the cell types, making it difficult to assess the potential cause of this discrepancy. Third, given the small number of OSNs, it is surprising to have multiple musk receptors detected in the open side of the olfactory epithelium whereas almost none in the closed side. Since each OR type only constitutes ~0.1% of OSNs on average, the number of detected musk receptors is too high to be consistent with our current understanding and the rest of the data in the manuscript. Finally, unlike the other experiments, the authors did not describe any method details, nor was there any description of quality controls associated with the experiment. The concerns over the scRNASeq data do not diminish the value of the data presented in the bulk of the study but could be used for further analysis.

      We are grateful to the reviewer for raising these important questions.

      In the revised manuscript, we have clarified that the scRNA-seq dataset presented in the original version of the manuscript (now called dataset OE 1) was published and described in detail in a previous study (C. J. van der Linden et al., 2020). The reviewer is correct that the proportion of OSNs within that dataset was lower in that dataset than in other datasets that have been published more recently (using updated methods). We think this is likely because of the way that the cells were processed (e.g., from cryopreserved single cells followed by live/dead selection). However, because the open and closed sides were processed identically, we do not expect the ratios of OSNs of specific subtypes to be greatly affected. Hence, the differences observed for specific OSN subtypes on the open versus closed sides are expected to be valid.

      As the reviewer notes, there is a surprisingly large difference between the number of OSNs of musk-responsive subtypes on the open and closed sides within the OE 1 dataset. This difference is a key piece of information that led us to formulate the hypothesis in the study: that musk responsive subtypes are born at a higher rate in the presence of male/musk odor stimulation. And while it is true that, on average, each subtype represents ~0.1% of the population, it is known that there is wide variance in representations among different subtypes [e.g., (Ibarra-Soria et al., 2017)]. The frequencies of the musk responsive subtypes among all OSNs on the open side of OE 1 (0.3% for Olfr235, 0.4% for olfr1440, 0.06% for Olfr1434, 0% for olfr1431, and 1% for Olfr1437) are in line with previous findings.

      To confirm that the scRNA-seq findings from dataset OE 1 are not an artifact of the cell preparation methods used, we generated a second scRNA-seq dataset, OE 2, which has been added to the revised manuscript (Figure 1). The OE 2 dataset was prepared according to the same experimental timeline as OE 1, but the cells were captured immediately after dissociation and live/dead sorting via FACS. As expected, most cells within OE 2 dataset are OSNs (77% on the open side, 66% on the closed). Importantly, like the OE 1 dataset, the OE 2 dataset shows higher quantities of iOSNs of musk responsive subtypes on the open side of the OE compared to the closed (normalized for either total cells or total OSNs) (Figure 1–figure supplement 1D, E).

      A weakness of the experiment assessing musk receptor expression is that the authors do not distinguish immature from mature OSNs. Immature OSNs express multiple receptor types before they commit to the expression of a single type. The experiments do not reveal whether mature OSNs maintain an elevated expression level of musk receptors.

      While it is established that multiple ORs are coexpressed at a low level during OSN differentiation (Bashkirova et al., 2023; Fletcher et al., 2017; Hanchate et al., 2015; Pourmorady et al., 2024; Saraiva et al., 2015; Scholz et al., 2016; Tan et al., 2015), this has been found to occur primarily at the immediate neuronal precursor 3 (INP3) stage (Bashkirova et al., 2023; Fletcher et al., 2017), which is characterized by expression of Tex15 (Fletcher et al., 2017; Pourmorady et al., 2024) and precedes the immature OSN (iOSN) stage, which is characterized by expression of Gap43 (Fletcher et al., 2017; McIntyre et al., 2010; Verhaagen et al., 1989). Within the scRNA-seq datasets in the present study, iOSNs of specific subtypes are identified based on robust expression of Gap43 (Log<sup>2</sup> UMI > 1) and a specific OR gene (Log<sup>2</sup> UMI > 2), as described in the figures and methods. Thus, the cells defined as iOSNs are expected to express a single OR gene and this expression should be maintained as iOSNs transition to mOSNs. To confirm these predictions, we carried out a detailed analysis of OR expression at three different stages of OSN differentiation: INP3, iOSN, and mOSN (Figure 1–figure supplement 2). The cells chosen for analysis express the musk-responsive ORs Olfr235 or Olfr1440 or a randomly chosen OR Olfr701, in addition to markers that define INP3, iOSN, or mOSN cells. As expected, individual iOSNs and mOSNs of musk-responsive subtypes were found to exhibit robust and singular OR expression on the open and closed sides of OEs from UNO-treated mice. Moreover, and as observed previously, INP3 cells coexpress multiple OR transcripts at low levels. A detailed description of how the analysis was performed is included in the Methods section under Quantification and statistical analysis.

      Within the histology-based quantifications, newborn OSNs are identified based on their robust RNA-FISH signals corresponding to a specific OR transcript and an EdU label. Considering the EdU chase time of 7 days, most EdU-positive cells are expected to have passed the INP3 stage and be iOSNs or mOSNs. Moreover, considering the low level of OR expression within INP3 cells, it is unlikely OR transcripts are expressed at a high enough level to be detectable and/or counted at this stage and thereby affect newborn OSN quantifications.

      There are also two conceptual issues that are of concern. The first is the concept of selective neurogenesis. The data show an increased expression of musk receptors in response to male odor stimulation. The authors argue that this indicates selective neurogenesis of the musk receptor types. However, it is not clear what the distinction is between elevated receptor expression and a commitment to a specific fate at an early stage of development. As immature OSNs express multiple receptors, a likely scenario is that some newly differentiated immature OSNs have elevated expression of not only the musk receptors but also other receptors. The current experiments do not distinguish the two alternatives. Moreover, as pointed out above, it is not clear whether mature OSNs maintain the increased expression. Although a scRNASeq experiment can clarify it, the authors, unfortunately, did not perform an in-depth analysis to determine at which point of neurogenesis the cells commit to a specific musk receptor type. The quality of the scRNASeq data unfortunately also does not lend confidence for this type of analysis.

      The addition of a second scRNA-seq dataset within the revised manuscript (Figure 1), combined with the new scRNA-seq-based analyses of OR expression in INP3, iOSN, and mOSN cells (Figure 1-figure supplement 2), provide strong evidence that iOSNs and mOSNs robustly express a single OR gene and that cellular expression is stable from the iOSN to the mOSN stage. These analyses do not support a scenario in which odor stimulation causes upregulated expression of multiple ORs and thereby causes apparent increases in quantities of newly generated OSNs that express musk-responsive ORs. Rather, the data firmly support a mechanism in which odor stimulation increases quantities of newly generated OSNs that have stably committed to the robust expression of a single musk-responsive OR.

      A second conceptual issue, the idea of homeostasis in regeneration, which the authors presented in the Introduction, needs clarification. In its current form, it is confusing. It could mean that a maintenance of the distribution of receptor types, or it could mean the proper replacement of a specific OR type upon the loss of this type. The authors seem to refer to the latter and should define it properly.

      We have revised the Introduction section to clarify our use of the term homeostatic in one instance (paragraph 4) and replace it with more specific language in a second instance (paragraph 5).

      Reviewer #3 (Recommendations For The Authors):

      Concerns over scRNASeq data. It appears that the samples may have included non-OE tissues, which reduced the representation of the OSNs. This experiment may need to be repeated to increase the number of OSNs.

      As outlined in the response to the public comments, we think that the low proportion of OSNs in the OE 1 data set reflects how the cells were prepared and processed. We have now included a second scRNA-seq dataset to address this concern.

      Cell types should be identified in the scRNASeq analysis, and the number of cells documented for each cell type, at least for the OSNs. The data should be made available for general access.

      We have now clarified that the OE 1 dataset was published as part of a previous study (C. J. van der Linden et al., 2020) and was made publicly available as part of that study (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE157119). All cell types in the newly generated OE 2 dataset have been annotated (Figure 1) and this dataset has also been made publicly available (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE278693). The numbers and percentages of OSNs within OE 1 and OE 2 datasets have been added to the legend of Figure 1-figure supplement 1.

      The specific OR types should be segregated for mature and immature OSNs. The percentage of a specific OR type should be normalized to the total number of OSNs, rather than the total cells. The current quantification is misleading because it gives the false sense that the muscone receptors represent ~0.1% of cells when the proportion is much higher if only OSNs are considered.

      In the revised manuscript, quantities of iOSNs (Gap43+ cells) of specific subtypes within the OE 1 and OE 2 scRNA-seq datasets are graphed as percentages of both all OSNs (Figure 1E, Figure 1–figure supplement 1D) and all cells (Figure 1–figure supplement 1E). As a percentage of all OSNs, average quantities of iOSNs of musk responsive subtypes on the open side of the OE range from 0.005% (for Olfr1431) to 0.14% (for Olfr1440) (Figure 1E).

      Within the feature plots for the two datasets, the differentiation stages of indicated OSNs have been clearly defined within the figures and figure legends. For the OE 1 dataset, iOSNs are differentiated from mOSNs by arrows (Figure 1–figure supplement 1C). For the OE 2 dataset (Figure 1D), only immature OSNs are shown for simplicity.

      Technical details of the scRNASeq should be documented. In the feature plot of musk-response receptors (Figure. 1D), it is better to use the actual quantity of expression rather than binarized representation (with or without an OR). If one needs to use on/off to determine the number of cells for a given OR type, then the criteria of selection should be given.

      Technical details of generation of the scRNA-seq datasets have been documented in the “Method details” section (for the OE 2 dataset) and in the method section of our previous publication of the OE 1 dataset (C. J. van der Linden et al., 2020). Details of the scRNA-seq analyses, including the criteria used to define immature OSNs of specific subtypes, are documented within the “Quantification and statistical analysis” section.

      Within the feature plots, we have decided to show OSNs of a given subtype in a binary fashion using specific colors for the sake of simplicity (Figure 1D, Figure 1-figure supplement 1C). To address the reviewer’s cooncern, we have added a new figure that provides detailed information about OR transcript expression (levels and genes) within iOSNs and mOSNs of two different musk responsive subtypes and a randomly chosen subtype (Figure 1-figure supplement 2).

      An in-depth analysis of the onset of OR expression in the GBC, INP, immature, and mature OSNs should be performed. It is also important to determine how many other receptors are detected in the cells that express the musk receptors. The current scRNASeq data may not be of sufficiently high quality and the experiment needs to be repeated. It is also important for the authors to take measures to eliminate ambient RNA contamination.

      The revised manuscript includes a second scRNA-seq dataset (OE 2; Figure 1). Details of how both the original (OE 1) and new datasets were generated have been documented within the Methods sections of the corresponding publications [(C. J. van der Linden et al., 2020); present study]. For both datasets, live/dead selection of cells was performed, which was expected to reduce ambient RNA.

      The revised manuscript also includes a new figure that provides detailed information about OR transcript expression within INP3, iOSN and mOSN cells that express one of two different musk responsive ORs or a randomly chosen OR (Figure 1-figure supplement 2). These data reveal, as reported previously (Bashkirova et al., 2023; Fletcher et al., 2017; Pourmorady et al., 2024), that low levels of multiple OR transcripts are detected in INP3 (Tex15+) cells. By contrast, iOSN (Gap43+) and mOSN (Omp+) cells robustly express a single OR, with little or no expression of other ORs.

      Quantification of cells for Figure 2-7 should be changed. Instead of using cell number per 1/2 section, the data should be calculated using density (using the area of the epithelium or normalized to the total number of cells (based on DAPI staining). This is because multiple sections are taken from the same mouse along the A-P axis. These sections have different sizes and numbers of cells.

      As noted in response to a similar concern of Reviewer #2, this has been addressed in two ways within the revised manuscript:

      (1) We have noted within the Methods section that the approach of using half-sections for normalization has been used in multiple previous studies for quantifying newborn (OR+/EdU+) and total (OR+) OSN abundances (Hossain et al., 2023; Ibarra-Soria et al., 2017; C. van der Linden et al., 2018; C. J. van der Linden et al., 2020). Additionally, within the figure legends and Methods, we have more thoroughly described the approach used, including that it relies on averaging the quantifications from at least 5 high-quality coronal OE tissue sections that are evenly distributed throughout the anterior-posterior length of each OE and thereby mitigates the effects of section size and cell number variation among sections. In the case of UNO treated mice, the open and closed sides within the same section are paired, which further reduces the effects of section-to section variation. We have found that this approach yields reproducible quantities of newborn and total OSNs among biological replicate mice and enables accurate assessment of how quantities of OSNs of specific subtypes change as a result of altered olfactory experience, a key objective of this study.

      (2) To assess whether the use of alternative approaches for normalizing newborn OSN quantities suggested by the reviewers would affect the present study’s findings, we compared three methods for normalizing the effects of exposure to male odors or muscone on quantities of newborn Olfr235 OSNs in the OEs of both UNO-treated and non-occluded mice: 1) OR+/EdU+ OSNs per half-section (used in this study), 2) OR+/EdU+ OSNs per total number of EdU+ cells (reviewer suggestion (i)), and 3) OR+/EdU+ OSNs per unit of DAPI+ area (an approximate measure of nuclei number; reviewer suggestion (ii)). The three normalization methods yielded statistically indistinguishable differences in assessing the effects of exposure of either UNO-treated or non-occluded mice to male odors (newly added Figure 2–figure supplement 2 and Figure 3–figure supplement 2), or of exposure of non-occluded mice to muscone (newly added Figure 4–figure supplement 3). Based on these findings, and the considerable time that would be required to renormalize all data in the manuscript, we have chosen to maintain the use of normalization per half-section.

      References

      Bashkirova, E. V., Klimpert, N., Monahan, K., Campbell, C. E., Osinski, J., Tan, L., Schieren, I., Pourmorady, A., Stecky, B., Barnea, G., Xie, X. S., Abdus-Saboor, I., Shykind, B. M., Marlin, B. J., Gronostajski, R. M., Fleischmann, A., & Lomvardas, S. (2023). Opposing, spatially-determined epigenetic forces impose restrictions on stochastic olfactory receptor choice. eLife, 12, RP87445. https://doi.org/10.7554/eLife.87445

      Coppola, D. M. (2012). Studies of olfactory system neural plasticity: The contribution of the unilateral naris occlusion technique. Neural Plasticity, 2012, 351752. https://doi.org/10.1155/2012/351752

      Fletcher, R. B., Das, D., Gadye, L., Street, K. N., Baudhuin, A., Wagner, A., Cole, M. B., Flores, Q., Choi, Y. G., Yosef, N., Purdom, E., Dudoit, S., Risso, D., & Ngai, J. (2017). Deconstructing Olfactory Stem Cell Trajectories at Single-Cell Resolution. Cell Stem Cell, 20(6), 817-830.e8. https://doi.org/10.1016/j.stem.2017.04.003

      Han, X., Jiang, Y., Feng, N., Yang, P., Zhang, M., Jin, W., Zhang, T., Huang, Z., Zhao, H., Zhang, K., Liu, S., & Hu, D. (2022). Comparison of the Homology Between Muskrat Scented Gland and Mouse Preputial Gland. Journal of Mammalian Evolution, 29(2), 435–446. https://doi.org/10.1007/s10914-022-09604-w

      Hanchate, N. K., Kondoh, K., Lu, Z., Kuang, D., Ye, X., Qiu, X., Pachter, L., Trapnell, C., & Buck, L. B. (2015). Single-cell transcriptomics reveals receptor transformations during olfactory neurogenesis. Science (New York, N.Y.), 350(6265), 1251–1255. https://doi.org/10.1126/science.aad2456

      Hossain, K., Smith, M., & Santoro, S. W. (2023). A histological protocol for quantifying the birthrates of specific subtypes of olfactory sensory neurons in mice. STAR Protocols, 4(3), 102432. https://doi.org/10.1016/j.xpro.2023.102432

      Ibarra-Soria, X., Nakahara, T. S., Lilue, J., Jiang, Y., Trimmer, C., Souza, M. A., Netto, P. H., Ikegami, K., Murphy, N. R., Kusma, M., Kirton, A., Saraiva, L. R., Keane, T. M., Matsunami, H., Mainland, J., Papes, F., & Logan, D. W. (2017). Variation in olfactory neuron repertoires is genetically controlled and environmentally modulated. eLife, 6. https://doi.org/10.7554/eLife.21476

      Kelemen, G. (1947). The junction of the nasal cavity and the pharyngeal tube in the rat. Archives of Otolaryngology, 45(2), 159–168. https://doi.org/10.1001/archotol.1947.00690010168002

      Lin, D. Y., Zhang, S.-Z., Block, E., & Katz, L. C. (2005). Encoding social signals in the mouse main olfactory bulb. Nature, 434(7032), 470–477. https://doi.org/10.1038/nature03414

      McIntyre, J. C., Titlow, W. B., & McClintock, T. S. (2010). Axon growth and guidance genes identify nascent, immature, and mature olfactory sensory neurons. Journal of Neuroscience Research, 88(15), 3243–3256. https://doi.org/10.1002/jnr.22497

      Pourmorady, A. D., Bashkirova, E. V., Chiariello, A. M., Belagzhal, H., Kodra, A., Duffié, R., Kahiapo, J., Monahan, K., Pulupa, J., Schieren, I., Osterhoudt, A., Dekker, J., Nicodemi, M., & Lomvardas, S. (2024). RNA-mediated symmetry breaking enables singular olfactory receptor choice. Nature, 625(7993), 181–188. https://doi.org/10.1038/s41586-023-06845-4

      Saraiva, L. R., Ibarra-Soria, X., Khan, M., Omura, M., Scialdone, A., Mombaerts, P., Marioni, J. C., & Logan, D. W. (2015). Hierarchical deconstruction of mouse olfactory sensory neurons: From whole mucosa to single-cell RNA-seq. Scientific Reports, 5, 18178. https://doi.org/10.1038/srep18178

      Sato-Akuhara, N., Horio, N., Kato-Namba, A., Yoshikawa, K., Niimura, Y., Ihara, S., Shirasu, M., & Touhara, K. (2016). Ligand Specificity and Evolution of Mammalian Musk Odor Receptors: Effect of Single Receptor Deletion on Odor Detection. The Journal of Neuroscience: The Official Journal of the Society for Neuroscience, 36(16), 4482–4491. https://doi.org/10.1523/JNEUROSCI.3259-15.2016

      Scholz, P., Kalbe, B., Jansen, F., Altmueller, J., Becker, C., Mohrhardt, J., Schreiner, B., Gisselmann, G., Hatt, H., & Osterloh, S. (2016). Transcriptome Analysis of Murine Olfactory Sensory Neurons during Development Using Single Cell RNA-Seq. Chemical Senses, 41(4), 313–323. https://doi.org/10.1093/chemse/bjw003

      Schwende, F. J., Wiesler, D., Jorgenson, J. W., Carmack, M., & Novotny, M. (1986). Urinary volatile constituents of the house mouse,Mus musculus, and their endocrine dependency. Journal of Chemical Ecology, 12(1), 277–296. https://doi.org/10.1007/BF01045611

      Shirasu, M., Yoshikawa, K., Takai, Y., Nakashima, A., Takeuchi, H., Sakano, H., & Touhara, K. (2014). Olfactory receptor and neural pathway responsible for highly selective sensing of musk odors. Neuron, 81(1), 165–178. https://doi.org/10.1016/j.neuron.2013.10.021

      Tan, L., Li, Q., & Xie, X. S. (2015). Olfactory sensory neurons transiently express multiple olfactory receptors during development. Molecular Systems Biology, 11(12), 844. https://doi.org/10.15252/msb.20156639

      van der Linden, C. J., Gupta, P., Bhuiya, A. I., Riddick, K. R., Hossain, K., & Santoro, S. W. (2020). Olfactory Stimulation Regulates the Birth of Neurons That Express Specific Odorant Receptors. Cell Reports, 33(1), 108210. https://doi.org/10.1016/j.celrep.2020.108210

      van der Linden, C., Jakob, S., Gupta, P., Dulac, C., & Santoro, S. W. (2018). Sex separation induces differences in the olfactory sensory receptor repertoires of male and female mice. Nature Communications, 9(1), 5081. https://doi.org/10.1038/s41467-018-07120-1

      Verhaagen, J., Oestreicher, A. B., Gispen, W. H., & Margolis, F. L. (1989). The expression of the growth associated protein B50/GAP43 in the olfactory system of neonatal and adult rats. The Journal of Neuroscience: The Official Journal of the Society for Neuroscience, 9(2), 683–691.

      Vihani, A., Hu, X. S., Gundala, S., Koyama, S., Block, E., & Matsunami, H. (2020). Semiochemical responsive olfactory sensory neurons are sexually dimorphic and plastic. eLife, 9, e54501. https://doi.org/10.7554/eLife.54501

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, Le et al.. aimed to explore whether AAV-mediated overexpression of Oct4 could induce neurogenic competence in adult murine Müller glia, a cell type that, unlike its counterparts in cold-blooded vertebrates, lacks regenerative potential in mammals. The primary goal was to determine whether Oct4 alone, or in combination with Notch signaling inhibition, could drive Müller glia to transdifferentiate into bipolar neurons, offering a potential strategy for retinal regeneration.

      The authors demonstrated that Oct4 overexpression alone resulted in the conversion of 5.1% of Müller glia into Otx2+ bipolar-like neurons by five weeks post-injury, compared to 1.1% at two weeks. To further enhance the efficiency of this conversion, they investigated the synergistic effect of Notch signaling inhibition by genetically disrupting Rbpj, a key Notch effector. Under these conditions, the percentage of Müller gliaderived bipolar cells increased significantly to 24.3%, compared to 4.5% in Rbpjdeficient controls without Oct4 overexpression. Similarly, in Notch1/2 double-knockout Müller glia, Oct4 overexpression increased the proportion of GFP+ bipolar cells from 6.6% to 15.8%.

      To elucidate the molecular mechanisms driving this reprogramming, the authors performed single-cell RNA sequencing (scRNA-seq) and ATAC-seq, revealing that Oct4 overexpression significantly altered gene regulatory networks. They identified Rfx4, Sox2, and Klf4 as potential mediators of Oct4-induced neurogenic competence, suggesting that Oct4 cooperates with endogenously expressed neurogenic factors to reshape Müller glia identity.

      Overall, this study aimed to establish Oct4 overexpression as a novel and efficient strategy to reprogram mammalian Müller glia into retinal neurons, demonstrating both its independent and synergistic effects with Notch pathway inhibition. The findings have important implications for regenerative therapies as they suggest that manipulating pluripotency factors in vivo could unlock the neurogenic potential of Müller glia for treating retinal degenerative diseases.

      Strengths:

      (1) Novelty: The study provides compelling evidence that Oct4 overexpression alone can induce Müller glia-to-bipolar neuron conversion, challenging the conventional view that mammalian Müller glia lacks neurogenic potential.

      (2) Technological Advances: The combination of Muller glia-specific labeling and modifying mouse line, AAV-GFAP promoter-mediated gene expression, single-cell RNA-seq, and ATAC-seq provides a comprehensive mechanistic dissection of glial reprogramming.

      (3) Synergistic Effects: The finding that Oct4 overexpression enhances neurogenesis in the absence of Notch signaling introduces a new avenue for retinal repair strategies.

      Weaknesses:

      (1) In this study, the authors did not perform a comprehensive functional assessment of the bipolar cells derived from Müller glia to confirm their neuronal identity and functionality.

      (2) Demonstrating visual recovery in a bipolar cell-deficiency disease model would significantly enhance the translational impact of this work and further validate its therapeutic potential.

      Response: We thank the Reviewer for their evaluation. We agree that functional analysis of Müller glia-derived bipolar cells is indeed important, but is beyond the current scope of the manuscript.

      Reviewer #2 (Public review):

      Summary:

      The authors harness single-cell RNAseq data from zebrafish and mice to identify Oct4 as a candidate driver of neurogenesis. They then use adeno-associated virus vectors to show that while Oct4 overexpression alone converts rare adult Müller glia (MG) to bipolar cells, it synergizes with Notch pathway inhibition to cause this neurogenesis (achieved by Cre-mediated knockout of Rbpj floxed allele). Importantly, they genetically lineage-mark adult MG using a GLAST-CreER transgene and a Sun-GFP reporter, so that any non-MG cells that convert can be identified unambiguously. This is crucial because several high-profile papers made erroneous claims using short promoters in the viral delivery vector itself to mark MG, but those promoters are leaky and mark other non-MG cell types, making it impossible to definitively state whether manipulations studied were actually causing neurogenesis, or were merely the result of expression in pre-existing neurons. Once the authors establish Oct4 + RbpjKO synergy they use snRNAseq/ATACseq to identify known and novel transcription factors that could play a role in driving neurogenesis.

      Strengths:

      The system to mark MG is stringent, so the authors are studying transdifferentiation, not artifactual effects due to leaky viral promoters. The synergy between Oct4 and Notch pathway blockade is notable. The single-cell results add the potential involvement of new players such as Rfx4 in adult-MG-neurogenesis.

      Weaknesses:

      The existing version is difficult to read due to an unusually high number of text errors (e.g. references to the wrong figure panels etc.). A fuller explanation for the fraction of non-MG cells seen in control scRNAseq assays is required, particularly because the neurogenic trajectory which is enhanced in the Oct4/Rbpj-KO context is also evident in the control retina. Claims regarding the involvement of transcription factors in adult neurogenesis (such as Rfx4) need to be toned down unless they are backed up with functional data. It is possible that such factors are important, but equally, they may have no role or a redundant role, and without functional tests, it's impossible to say one way or the other.

      Overall, the authors achieved what they set out to do, and have made new insights into how neurogenesis can be stimulated in MG. Ultimately, a major long-term goal in the field is to replace lost photoreceptors as this is most relevant to many human visual disorders, and while this paper (like all others before it) does not generate rods or cones, it opens new strategies to coax MG to form a related neuronal cell type. Their approach underscores the benefits of using a gold-standard approach for lineage tracing.

      We thank the Reviewer for their evaluation. We have made extensive changes to the manuscript to correct errors and modify discussion as recommended. These are detailed below in our point-by-point responses to specific recommendations to the authors.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor corrections:

      (1) In Figure 1C top GFAP-mCherry panel, two dim GFP + cells have colocalized with Otx2, is it caused by optic imaging thickness or some muller glia cells having the Otx2 expression?

      This indeed reflects the effects of optic imaging thickness. Colocalization of Sun1-GFP and Otx2 is not observed when Z-stack images are examined in GlastCreER;Sun1-GFP retinas. This can also be appreciated by the fact that, in cases of apparent overlap of nuclear envelope-targeted Sun1 and Otx2, the sizes of the labeled areas differ. In cases of true expression overlap, such as is seen following Oct4 overexpression, the labeled areas are the same size, or very nearly so.

      Whether the Glast-CreERT2 x Rosa26-LSL-Sun1-GFP mouse line has cross-labeling with the Otx2+ bipolar cells, the author should image the mCherry ctrl sample with a thin optical imaging layer with a small pinhole for Z-stack to verify the co-labeling the GFP and Otx2 in mCherry ctrl sample.

      Please see above. Since we first described this line (de Melo, et al. 2012), we have examined thousands of sections of GlastCreER;Sun1-GFP retinas, and have yet to see a single GFP-positive neuron. To avoid confusion, however, we have replaced these images with an additional image from a control mCherry-infected GlastCreER;Sun1-GFP retina processed for the same study.

      In the middle upper panel, Oct4-mCherry group, the white arrows indicate the GFP colocalized with Otx2 signal, but seems not mCherry positive, by contrast, the neighbor cells have significant mCherry expression but no colocalization with Otx2. The GFAP promoter-Oct4-mCherry may have stopped expression after the Müller Glia cells were converted into Otx2+ bipolar cells, but is there any middle stage in which the Oct4mCherry and Otx2 co-expression? And after Müller glia to Bipolar conversion, why have Glast-CreERT2 driven GFP expressions not suppressed as GFAP promoter driven Oct4-mCherry? Could the author discuss this point?

      We observed a significant number of Muller glia-derived cells expressing both Otx2 and weak mCherry signal. GFP expression is driven by the ubiquitous CAG promoter following Cre-dependent excision of a transcriptional stop cassette. We have modified the text to make this point explicit.

      (2) In Figure S2b, the mouse is labeled with wild type; I assume it should be the same mouse line as Fig.1. Otherwise, the author should describe the source of the GFP signal.

      “Wildtype” in this case refers to GlastCreER;Sun1-GFP controls, which as the Reviewer correctly points out, are not truly wildtype. The genotype of these animals is specified in all figure legends, and is now referred to as “control” rather than “wildtype” in the figures and main text throughout.

      In Figure S2k and l, mCherry ctrl panel, the GFP+ cells looked co-labeling with Otx2, so again, is it the thicker optical imaging layer that caused overlapping vertically or the low specific of Müller Glia of the mouse line? Please describe the stars' meaning in Figure S2i,j in the figure legend. There are 2 figures labeled "n" of the quantification data.

      This is, again, an example of the thicker optical imaging layer causing apparent overlap. We have previously demonstrated that the Sun1-GFP+ cells do not co-label with Otx2 in GFAP-mCherry AAV-injected control retinas (Le et al., 2022; Fig. 2C). The asterisks (*) indicate mouse-on-mouse vascular staining, which is now clarified in the figure legend. The 2 figures labeled ‘n’ have been relabeled as ‘m’ and ‘n’.

      (3) In Figure 2c in the top panel, the Otx2 image was wrong; please replace it with the correct one.

      We thank the Reviewer for spotting this error. This is an inadvertent duplication of the single-channel Otx2 staining for mCherry control sample. We have replaced this with the correct image.

      (4) In Figure 3a, the Rbpj-cKO mouse line was used, but where was the GFP signal from? Please verify the mouse line you used in your work. The same question is also asked in Figure S3, S4b.

      GlastCreER;Rpbj<sup>lox/lox</sup>;Sun1-GFP were used in Figure 3a. As now specified in the Methods and all figure legends, all mice used in this study carry both the GlastCreER and Sun1-GFP transgenes.

      (5) In Figure S4c,d, and 5 wks time point, if the authors quantify the GFP+/Sox2- cells changing, it will be more helpful to understand the percentage of the Müller glia cells conversion to Bipolar cells compared to the Figure 2D, and can be as a supplement to the conclusion Müller to Bipolar conversion rather the Müller proliferation.

      Sox2-/GFP+ cells are a measure of Müller glia to bipolar cell conversion that complements that of GFP+/Otx2+ cells. This is now clarified in the text. We also include quantification of Sox2-/GFP+ neurons at 5 weeks post-injury in Fig. S5b.

      (6) In Figure S1b,c, there is a large portion of cells that are activated Müller glia after NMDA injury. Did the activated Müller glial cells lose their Müller glial identity? Between the loss of Müller glial identity and neuronal reprogramming, are there any markers that can be used to assess whether Müller glial cells are truly transdifferentiating into neurons rather than remaining in a reactive glial state or an intermediate phase?

      Wildtype Müller glia progressively revert to resting state, and by 72 hours post-injury have already lost expression of Klf4 and Myc (Hoang, et al. 2020), a point which is now specifically mentioned in the text. In GlastCreER;Sun1-GFP;Nfia/b/x<sup>lox/lox</sup>;Rbpj<sup>lox/lox</sup> Müller glia, reactive MG appear to largely convert to bipolar and amacrine-like cells, and it remains unclear if they eventually revert to a resting state (Le, et al. 2024).

      Reviewer #2 (Recommendations for the authors):

      This work demonstrates that Oct4 (Pou5f3) can induce neurogenesis in murine Müller glia (MG). Le et al start by showing that murine and zebrafish MG lack expression of Oct4 (Pou5f3) and its target Nanog. To assess the effect of Oct4 they first label adult MG with Sun1-GFP using tamoxifen-treated GlastCreER;Sun1-GFP mice, then later transduce in vivo with AAV vectors expressing mCherry alone or Oct4 + mCherry. Subsequently, they damage the retina with NMDA and assess the effects several weeks later. In Oct4+ cells at 2 weeks there is rare induction of the neural determinant Ascl1, down-regulation of the MG marker Sox2, induction of bipolar markers (Otx2, Scgn,Cabp5) but not amacrine (HuC/D) or rod (Nrl) markers. Combining Oct4 with

      Notch inhibition (deleting floxed Rbpj) synergistically increases bipolar cell induction, with Otx2 staining rising to >20% of GFP-marked cells, and cells losing MG identify (loss of Sox2/9). EdU labeling was negligible suggesting direct trans-differentiation. Similar synergy was seen upon combining Oct4 expression with Notch1/2 double gene knockout. Attempts to combine Oct4 with Nfia, Nfib, and Nfix loss were unsuccessful as the GFAP promoter driving Oct4 in MG seems to require these three related transcription factors. scRNAseq confirmed the Oct4-overexpression/Rbpj-KO-driven increase in bipolar cells and decrease in MG cells and revealed that these manipulations may enhance bipolar cell genesis by repressing genes that define quiescent MG and enhancing expression of genes that define reactive MG and neurogenic cells. Finally, multiomic snRNA/scATAC-seq data was performed to assess the effect of Oct2 in wt or Rbpj null MG. This approach revealed that, as anticipated, more genes were up and down-regulated in the context of both manipulations vs Oct4 OE alone. Moreover, Oct4 and Rbpj KO reduced chromatin accessibility at target motifs for transcription factors involved in MG identify/quiescence, while MGPCs showed elevated accessibility for neurogenic factors. The combination of Oct4 OE and Rbpj KO induces accessibility at various interesting TF sites that may contribute to the synergistic neurogenesis, including Rfx4, Klf4, Insm1, and others.

      This is an interesting paper that adds to the growing literature on how neurogenesis can be induced in mammalian MG. The focus on Oct4 is interesting and the synergistic effects are striking and analyzed in some detail with scRNAseq and multiomic snRNA/scATACseq. The latter results provide useful new insight into transcriptional programs that may be critical in driving neurogenesis. Functional insight into these new candidates is not explored in this manuscript, but that's beyond the scope of the current work and forms the basis for new studies. There are some overreaching statements in the Discussion that need to be toned down, but apart from that and a long list of textual errors that need to be fixed, this paper is a valuable contribution to the field.

      Major comments

      There are numerous textual errors (some, but not all, examples are detailed in minor comments). It was difficult to follow this paper given the unusually high number of textual errors and the abbreviated legends. Greater attention should be paid to harmonizing the text with the figures and ensuring that the legends are correct and complete.

      The manuscript has been proofread carefully and errors corrected.

      The opening section of the scRNAseq data should outline briefly why sorting for GFP labeled cells purifies a significant fraction of non-MG cell types, despite the earlier claim, (which agrees with other publications), that GLAST-CreER transgene expression is highly specific to MG. Presumably, it mainly/totally reflects the co-purification of cells, cell fragments, and/or cell-free mRNA from other lineages. Is it also possible that a fraction (however small) of these cells reflect low-level spurious/temporary activation of GLAST-CreER expression in non-MG? The "contamination" is present despite the addition of the GFP sequence to the reference genome (as explained in Methods). They mention: "a clear differentiation trajectory connecting Muller glia, neurogenic Muller gliaderived progenitor cells (MGPCs), and differentiating amacrine and bipolar cells (Fig. 3b)". However, the same trajectory is evident in control mCherry samples, so one could argue that this trajectory is active in normal retina at some low rate, but that would/should equate to rare sun-GFP+ non-MG in controls. Are there any such cells, even extremely rarely, or is it truly 0%? At any rate, the authors need to raise these concerns and offer some explanation(s) at the start of their scRNAseq Results section. If there are really no such sun-GFP+ cells, the authors should comment on the presence of the apparent inactive trajectory in the Discussion.

      Since we first described this line (de Melo, et al. 2012), we have examined thousands of sections of GlastCreER;Sun1-GFP retinas, and have yet to see a single GFP-positive neuron. We have also previously shown (Hoang, et al. 2020) that FACSbased isolation of GFP-positive cells from GlastCreER;Sun1-GFP yields a roughly thirty-fold enrichment of Muller glia, implying the presence of small numbers of contaminating neurons. We thereby conclude that the presence of small numbers of neurons (rods, cones, bipolar, and amacrine cells) in the control GlastCreER;Sun1-GFP represents contamination rather than low levels of glia-to-neuron conversion, particularly since we are unable to detect the expression of genes such as neurogenic bHLH factors or immature photoreceptor precursor-specific factors such as Prdm1 that indicate the presence of intermediate cell states. This is now addressed in the Results section related to both Figures 3 and 4.

      Discussion:

      In reference to other strategies to induce neurogenesis the authors make the claim that Oct4 is fundamentally different: "In these cases, Müller glia broadly upregulate proneural genes and/or downregulate Notch signaling. Oct4 instead induces expression of the neurogenic transcription factor Rfx4, which is not expressed in developing retina. It is likely that activation of this parallel pathway to neurogenic competence in part accounts for synergistic induction of neurogenesis seen in Rbpj-deficient Müller glia". First, all these strategies, including Oct4, seem to activate bHLH factors, so they have that in common and the authors should note that overlap. More seriously, without functional tests (e.g. KO Rfx4) the authors need to dial back the over-reaching statement that Rfx4 is the fundamental mechanism driving the Oct4 effect. They can certainly suggest that this is one possibility, but equally, Rfx4 may have very little or no effect on neurogenesis, or it could act redundantly with some of the other factors the authors uncovered. It's impossible to know without functional data, so they either need to add the functional data, or hold back on the strong one-sided and overreaching claim.

      Since both Rfx4 expression and motif accessibility are selectively observed following Oct4 overexpression, and Rfx4 also has known neurogenic activity, we stand by our conclusion that it is a particularly strong candidate for mediating the neurogenic effects of Oct4 overexpression. However, the Reviewer is correct that in the absence of functional data, speculation about its function should be qualified. We have done this in the revised manuscript.

      Minor comments

      This sentence in the Results is confusing: "While expression of neurogenic bHLH factors driven by the Gfap promoter was rapidly silenced in Muller glia and activated in amacrine and retinal ganglion cells, Gfap-Oct4-mCherry remained selectively expressed in Muller glia but did not induce detectable levels of Muller glia-derived neurogenesis in the uninjured retina (Le et al., 2022)". The cited reference is at the end so it sounds like the Oct4 assay was performed in Le et al 2022, and there is no reference to a Figure for the Oct4 data in the current paper.

      As stated here, in Le, et al. 2022, we did not observe any conversion of Sun1-GFP-positive Muller glia to neurons in the absence of injury. In the current study, we instead test whether NMDA-induced excitotoxicity induced glia to neuron conversion in Muller glia overexpressing Oct4. This is now made clear in the revised text.

      There are many errors and omissions regarding Figure S2:

      Figure S2a, b legend, and panels do not match. 2a should be a schematic of the strategy to label MG with Sun1-GFP using GLAST-Cre and a floxed Sun1-GFP allele, but that's missing and instead, the current 2a is a schematic of AAV vectors. It seems that the current 2b legend may describe the combination of the current 2a and 2b panels.

      This has been corrected.

      Figure S2: Asterisks label certain stained elements in the Oct4 labeled panels, but there is no explanation in the legend. Are these meant to indicate non-specific staining? If so, what is the evidence that the signal is non-specific?

      These asterisks represent non-specific mouse-on-mouse vascular staining observed with the mouse monoclonal anti-Oct4 used in this study. This is now indicated in the figure legend.

      The text refers to Ascl1 staining in Figure S2e,f, but it's S2g,h.

      This has been corrected.

      Re this: "While Sun1-GFP-positive cells infected with Oct4-mCherry mostly express the Muller glial marker Sox2 (Fig. S2a,b), from 2 weeks post-injury onwards a subset of GFP positive cells did not show detectable Sox2 expression (Fig. S2b, yellow arrows)". Figure S2a, b are schematic diagrams, not immunofluorescence. They probably mean Figure S2c, d.

      This has been corrected.

      Fig S2m is mislabeled "n".

      This has been corrected.

      There are probably other errors with this figure, but I mostly gave up at this point. The authors should go through the paper to find and correct any additional mistakes/omissions in the text and legends.

      The manuscript has been carefully proofread and errors corrected.

      The figure panels are not always mentioned in the order that they appear. There are many examples.

      Figure panels are now mentioned in the order that they appear.

      Several schematics use "d-18-14" to indicate "day -18 to -14". The former is at first uninterpretable or at best unclear (could mean day -18 to day 14), perhaps d -18 to -14, or d -18:-14 would be clearer.

      This has been corrected.

      Re: "AAV-infected wildtype Muller glia could be readily identified by selective expression of Oct4 (Fig. 4e). Wildtype Oct4-expressing Muller glia give rise to both small numbers of neurogenic MGPCs (Fig. 4b),". Figure 4E is labeled Pou5f1, but it would be helpful to avoid confusion by also indicating on the figure that Pou5f1 = Oct4; and Fig 4b does not indicate neurogenic MGPCs (perhaps they mean 4c).

      This has been corrected.

      Some parts of the Results are written in the present tense and should be in the past tense (for guidance: https://www.nature.com/scitable/topicpage/effective-writing13815989/).

      Past tense is now used throughout.

      Pit1 (Pou1f1) is referred to as a "close variant" of Oct4/Pou4f5, but this is unclear (e.g. variant could mean a splice variant from the same locus) and the term "paralogue" should be used.

      “Paralogue” is now used in this context.

      Re: "Infection with Oct4-mCherry vector induced both Oct4 (Fig. S5e) and Ascl1 (Fig. S5d) expression in Notch1/2-deficient Müller glia." Supplementary image 5d is the one depicting Oct4 and 5e is the one showing Ascl1. However, the reference is reversed.

      This has been corrected.

    1. Author response:

      The following is the authors’ response to the current reviews.

      We deeply appreciate the reviewer’s careful review and critiques. These are excellent critiques that we are working on and probably require a few more years of work. Published together, we believe these critiques add value to our manuscript.


      The following is the authors’ response to the original reviews.

      Reviewer #2 (Public review):

      Summary:

      This manuscript by Yu and coworkers investigates the potential role of Secretory leukocyte protease inhibitor (SLPI) in Lyme arthritis. They show that, after needle inoculation of the Lyme disease (LD) agent, B. burgdorferi, compared to wild type mice, a SLPI-deficient mouse suffers elevated bacterial burden, joint swelling and inflammation, pro-inflammatory cytokines in the joint, and levels of serum neutrophil elastase (NE). They suggest that SLPI levels of Lyme disease patients are diminished relative to healthy controls. Finally, they find that SLPI may interact directly the B. burgdorferi.

      Strengths:

      Many of these observations are interesting and the use of SLPI-deficient mice is useful (and has not previously been done).

      Weaknesses:

      (a) The known role of SLPI in dampening inflammation and inflammatory damage by inhibition of NE makes the enhanced inflammation in the joint of B. burgdorferi-infected mice a predicted result; (b) The potential contribution of the greater bacterial burden to the enhanced inflammation is acknowledged but not experimentally addressed; (c) The relationship of SLPI binding by B. burgdorferi to the enhanced disease of SLPI-deficient mice is not addressed in this study, making the inclusion of this observation in this manuscript incomplete; and (d) assessment of SLPI levels in healthy controls vs. Lyme disease patients is inadequate.

      We greatly appreciate the critiques, and we do agree. Even though the observation of NE level is predictable, we believe that it is important to actually demonstrate it in the context of murine Lyme arthritis. The function of SLPI goes beyond inhibiting NE level.  As an ongoing project in our lab, we believe that the current study serves as a good starting point to explore the pleiotropic effects SLPI in the pathogenesis of murine Lyme arthritis and in patients. And, the critiques here are of great value to our research.

      Comments on revised version:

      Several of the points were addressed in the revised manuscript, but the following issues remain:

      Previous point that the relationship of SLPI binding to B. burgdorferi to the enhanced disease of SLPI-deficient mice is not investigated: The authors indicate that such investigations are ongoing. In the absence of any findings, I recommend that their interesting BASEHIT and subsequent studies be presented in a future study, which would have high impact.

      We thank the reviewer for the critique. We do agree that this part of the story is not complete. However, we would like to keep the BASEHIT and binding data in the paper, as we believe that it is an important finding. We confirmed the binding using ELISA, flow cytometry, and immunofluorescent microscopy. We showed that the binding is specific to infectious strain of B. burgdorferi, thus likely to contribute to the pathogenesis of murine Lyme arthritis. Our data suggest that SLPI can directly interact with a B. burgdorferi protein. We are exploring the biological significance of the binding. And this finding can be further explored by other labs too.

      Previous recommendation 1: (The authors added lines 267-68, not 287-68). This ambiguity is acknowledged but remains. In addition, in the revised manuscript, the authors state "However, these data also emphasize the importance of SLPI in controlling the development of inflammation in periarticular tissues of B. burgdorferi-infected mice." Given acknowledged limitations of interpretation, "suggest" would be more appropriate than "emphasize".

      We thank the reviewer for the careful reading, and we apologize for the mistake. The change has been made accordingly (line 268).

      Previous recommendation 5: The lack of clinical samples can be a challenge. Nevertheless, 4 of the 7 samples from LD patients are from individuals suffering from EM rather than arthritis (i.e., the manifestation that is the topic of the study) and some who are sampled multiple times, make an objective statistical comparison difficult. I don't have a suggestion as to how to address the difference in number of samples from a given subject. However, the authors could consider segregating EM vs. LA in their analysis (although it appears that limiting the comparison between HC and LA patients would not reveal a statistical difference).

      We thank the reviewer for the critique. And we agree with the reviewer that the patient’s data presented are not ideal. We believe that at this point the combination of the samples is most logical, as the number of samples we have from patients with Lyme arthritis is fairly limited. We stated the limitation in the discussion. We do believe that the finding of the correlation is important. It suggests the potential function of SLPI in patients, beyond murine infection.

      What’s more, various groups with large number of different samples can elucidate the relationship further.

      Previous recommendation 6: Given that binding of SLPI to the bacterial surface is an essential aspect of the authors' model, and that the ELISA assay to indicate SLPI binding used cell lysates rather than intact bacteria, a control PI staining to validate the integrity of bacteria seems reasonable.

      We appreciate the suggestion and has provided the propidium iodide staining in Supplemental Figure 5 (line 539-542, 568-569, 718-722).

      Previous recommendation 8: The inclusion of a no serum control (that presumably shows 100% viability) would validate the authors' assertion that 20% serum has bactericidal activity.

      We appreciate the suggestion. As stated in the manuscript (line 583-584), the percent viability was normalized to the control spirochetes culture without any treatment. Thus, the control spirochetes culture, without serum and SLPI treatment, showed 100% viability. We have revised Supplemental Figure 3 accordingly.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The paper proposes an interesting perspective on the spatio-temporal relationship between FC in fMRI and electrophysiology. The study found that while similar networks configurations are found in both modalities, there is a tendency for the networks to spatially converge more commonly at synchronous than asynchronous timepoints. However, my confidence in the findings and their interpretation is undermined by an incomplete justification for the expected outcomes for each of the proposed scenarios.

      As detailed below, the reviewer’s comment motivated us to conduct simulations to establish the relationship between the scenarios that we seek to adjudicate and the empirical outcomes.

      Main Concern

      Fig 1 makes sense to me conceptually, including the schematics of the trajectories, i.e.:

      - Scenario1. Temporally convergent, same trajectories through connectome state space

      - Scenario2. Temporally divergent, different trajectories through connectome state space

      However, based on my understanding (and apologies if I am mistaken), I am concerned that these scenarios do not necessarily translate into the schematic CRP plots shown in fig 2C, or the statements in the main text, i.e.:

      - For scenario1, "epochs of cross-modal spatial similarity should occur more frequently at on-diagonal (synchronous) than off-diagonal (asynchronous) entries, resulting in an on-/off-diagonal ratio larger than unity"

      - For scenario2, "epochs of spatial similarity could occur equally likely at on-diagonal and off-diagonal entries (ratio≈1)"

      Where do the authors get these statements and the schematics in fig2C from? They do not seem to be fully justified via previous literature, theory, or simulations?

      In particular, I am not convinced based on the evidence currently in the paper, that the ratio of off- to on-diagonal entries (and under what assumptions) is a definitive way to discriminate between scenarios 1 and 2.

      For example, what about the case where the same network configuration reoccurs in both modalities at multiple time points. It seems to me that you would get a CRP with entries occurring equally on the on-diagonal as on the off-diagonal, regardless of whether the dynamics are matched between the two modalities or not (i.e. regardless of scenario 1 or 2 being true).

      This thought experiment example might have a flaw in it, and the authors might ultimately be correct, but nonetheless a systematic justification needs to be provided for using the ratio of off- to on-diagonal entries to discriminate between scenario 1 and 2 (and under what assumptions it is valid).

      Thank you for raising this important point. In response, we have now included simulation results to complement our earlier authors’ response, which provided literature references and a theoretical explanation of the on-/off-diagonal ratio metric.

      In the absence of theory, the authors could use surrogate data for scenario 1 and 2. For example:

      a. For scenario 1, run the CRP using a single modality. E.g. feed in the EEG into the analysis as both modality 1 AND modality 2. This should provide at least one example of CRP under scenario 1 (although it does not ensure that all CRPs under this scenario will look like this, it is at least a useful sanity check).

      Note: This simulation was included in the previous round of author’s responses.

      b. For scenario 2, run the CRP using a single modality plus a shuffled version. E.g. feed in the EEG into the analysis as both modality 1 AND a temporally shuffled version of the EEG as modality 2. The temporal shuffling of the EEG could be done by simple splitting the data into blocks of say ~10s and then shuffling them into a new order. This should provide a version of the CRP under scenario 2 (although it does not ensure that all CRPs under this scenario will look like this, it is at least a useful sanity check)

      The authors have provided CRP plots for option a. It shows a CRP, as expected, consistent with scenario 1. This is a useful sanity check. However, as mentioned above, it does not ensure that all CRPs under this scenario will look like this.

      However, the authors have not shown a CRP as per option b. As such, there is an incomplete justification for the expected outcomes of the scenarios.

      Note that another option, which has not been carried out, is to use full simulations, with clearly specified assumptions, for scenario1 and 2. One way of doing this is to use a simplified (state-space) setup where you randomly simulate N spatially fixed networks that are independently switching on and off over time (i.e. "activation" is 0 or 1). Note that this would result in a N-dimensional connectome state space.

      Using this, you can simulate and compute the CRPs for the two scenarios:

      a. Scenario 1: where the simulated activation timecourses are set to be the same between both modalities

      b. Scenario 2: where the simulated activation timecourses are simulated separately for each of the modalities

      We followed the reviewer’s suggestion and have now included full simulations to address the concerns regarding the theory of the on-/off-diagonal ratio metric. As recommended, we defined a random quantized signal with N levels to represent the recurrent manifestation of N fixed connectome states. This setup was used to demonstrate the relationship between the two scenarios and the CRP observations used to adjudicate between the scenarios in our paper.

      The CRP matrices in Fig. S10 provide an example illustration of this simulation. In the case where the two state timeseries are identical, there are more co-occurrences of the same state (white entries) on the diagonal than off the diagonal (left subplot). This is in line with Scenario 1, where both spatial and temporal convergence are present. Conversely, in Scenario 2, where state time courses are shuffled, co-occurrences of the same states are more dispersed, and the diagonal prominence vanishes (right subplot). This difference illustrates how the CRP reflects the presence or absence of temporal alignment, dissociating scenarios 1 and 2.

      To quantitively validate this observation, we calculated the on-/off-diagonal ratio across simulations with varying N values. For Scenario 2 (shuffled version), the ratio consistently remained close to 1, indicating the absence of temporal synchronization. In contrast, Scenario 1 (non-shuffled version) produced significantly higher ratios, exceeding 1, confirming the metric's ability to capture meaningful synchrony. These results demonstrate that the simulations successfully replicate the expected relationship between the two scenarios and the CRPs, and validate the theoretical foundation of the ratio metric under the defined assumptions.

      Minor Concern

      Leakage correction. The paper states: "To mitigate this issue, we provide results from source-localized data both with and without leakage correction (supplementary and main text, respectively)." It is great that the authors provide both. However, given that FC in EEG is almost totally dominated by spatial leakage (see Hipp paper), the main results/figures for the scalp EEG should be done using spatial leakage corrected EEG data.

      Thank you. We agree that source leakage is an important consideration, which is why the current work investigated the intracranial EEG-fMRI data as a primary approach and subsequently added the scalp EEG-fMRI approach. While source leakage correction is essential for addressing spurious connectivity, it can also risk removing genuine functional connectivity that includes zero-lag relationships. We are reassured by the observation that the scalp data both without and with leakage correction confirmed the findings of the intracranial data, i.e., the presence of spatial and a lack of temporal cross-modal convergence. As such we do not believe that source leakage had a considerable impact on the specific question at hand.

      Reviewer #2 (Public review):

      Summary:

      The study investigates the brain's functional connectivity (FC) dynamics across different timescales using simultaneous recordings of intracranial EEG/source-localized EEG and fMRI. The primary research goal was to determine which of three convergence/divergence scenarios is the most likely to occur.

      The results indicate that despite similar FC patterns found in different data modalities, the timepoints were not aligned, indicating spatial convergence but temporal divergence.

      The researchers also found that FC patterns in different frequencies do not overlap significantly, emphasizing the multi-frequency nature of brain connectivity. Such asynchronous activity across frequency bands supports the idea of multiple connectivity states that operate independently and are organized into a multiplex system.

      Strengths:

      The data supporting the authors' claims are convincing and come from simultaneous recordings of fMRI and iEEG/EEG, which has been recently developed and adapted.

      The analysis methods are solid and involved a novel approach to analyzing the co-occurrence of FC patterns across modalities (cross-modal recurrence plot, CRP) and robust statistics, including replication of the main results using multiple operationalizations of the functional connectome (e.g., amplitude, orthogonalized, and phase-based coupling).

      In addition, the authors provided a detailed interpretation of the results, placing them in the context of recent advances and understanding of the relationships between functional connectivity and cognitive states.

      The authors also did a control analysis and verified the effect of temporal window size or different functional connecvitity operationalizations. I also applaud their effort to make the analysis code open-sourced.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The authors answer my concerns and they are resolved.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This study investigates alterations in the autophagic-lysosomal pathway in the Q175 HD knock-in model crossed with the TRGL autophagy reporter mouse. The findings provide valuable insights into autophagy dynamics in HD and the potential therapeutic benefits of modulating this pathway. The study suggests that autophagy stimulation may offer therapeutic benefits in the early stages of HD progression, with mTOR inhibition showing promise in ameliorating lysosomal pathology and reducing mutant huntingtin accumulation.

      However, the data raises concerns regarding the strength of the evidence. The observed changes in autophagic markers, such as autolysosome and lysosome numbers, are relatively modest, and the Western blot results do not fully match the quantitative results. These discrepancies highlight the need for further validation and more pronounced effects to strengthen the conclusions. While the study suggests the potential of autophagy regulation as a long-term therapeutic strategy, additional experiments and more reliable data are necessary to confirm the broader applicability of the TRGL/Q175 mouse model.

      Furthermore, the 2004 publication by Ravikumar et al. demonstrated that inhibition of mTOR by rapamycin or the rapamycin ester CCI-779 induces autophagy and reduces the toxicity of polyglutamine expansions in fly and mouse models of Huntington's disease. mTOR is a key regulator of autophagy, and its inhibition has been explored as a therapeutic strategy for various neurodegenerative diseases, including HD. Studies suggest that inhibiting mTOR enhances autophagy, leading to the clearance of mHTT aggregates. Given that dysfunction of the autophagic-lysosomal pathway and lysosomal function in HD is already well-established, and that mTOR inhibition as a therapeutic approach for HD is also known, this study does not present entirely novel findings.

      Major Concerns:

      (1) In Figure 3A1 and A2, delayed and/or deficient acidification of AL causes deficits in the reformation of LY to replenish the LY pool. However, in Figure S2D, there is no difference in AL formation or substrate degradation, as shown by the Western blotting results for CTSD and CTSB. How can these discrepancies be explained?

      We appreciate the reviewer raising this point, and we agree with the concern. Please note that the material used for our immunoblotting was hemibrain homogenates, containing not only neurons but also glial cells, so the results for any protein, e.g., CTSD or CTSB in Fig. S2D, represented combined signals from neurons and glial cells. Our longstanding experience with western blot analysis of autophagy pathway markers is that signals from glial cells significantly interfere with/dilute the signals from neurons. By contrast, the immunofluorescence (IF) results in Fig. 3A, obtained with the assistance of tfLC3 probe and hue angle-based AV/LY subtype analysis, revealed the in situ conditions of the AL and LY within neurons selectively, which reflects the advantage of using the in vivo neuron-specific expression of the LC3 probe combined with IF with a LY marker in this study and our other related studies (Lee, Rao et al. 2019, Lee, Yang et al. 2022) as explained in the Introduction of this paper. Please also refer to a similar discussion regarding the WB-detected protein levels of p-ATG14 in L542-547. 

      (2) The results demonstrate that in the brain sections of 17-month-old TRGL/Q175 mice, there was an increase in the number of acidic autolysosomes (AL), including poorly acidified autolysosomes (pa-AL), alongside a decrease in lysosome (LY) numbers. These AL/pa-AL changes were not significant in 2-month-old or 7-month-old TRGL/Q175 mice, where only a reduction in lysosome numbers was observed. This indicates that these changes, representing damage to the autophagy-lysosome pathway (ALP), manifest only at later stages of the disease. Considering that the ALP is affected predominantly in the advanced stages of the disease (e.g., at 17 months), why were 6-month-old TRGL/Q175 mice selected for oral mTORi INK treatment, and why was the treatment duration restricted to just 3 weeks?

      We thank the reviewer for the comment. A key outcome measure in our evaluation of mTORi treatment was amelioration of mHTT pathology, i.e., mHTT aggregates/IBs. Before conducting the mTORi treatment experiments, we had learned from our assessments of age-associated progression of mHTT aggresomes/IBs in mice of different ages (e.g., 2-, 6-, 10- and 17-mo) that there were already severe mHTT accumulations in Q175 at 10-mo-old (e.g., Fig. 2A). This is consistent with a previous report (Carty, Berson et al. 2015) showing that striatal mHTT inclusions dynamically increase from 4 to 8 months. From a therapeutic point of view, more aggregates in the mouse brain would make it more difficult for the autophagy machinery to clear these aggregates. Thus, the high degree of aggregates in 10- or 17-mo may not be modifiable by the mTORi and/or prevent reliable/sensitive measurements on mTORi-induced phenotype changes. We then preferred to apply the treatment to younger (i.e., 6-mo-old) mice when the mHTT pathology was not so severe, with detectable, albeit mild, ALP abnormality.  Additionally, due to the 2-year funding limit for this project, there was insufficient time to generate a large set of old mice (e.g., ~18-mo) for another drug treatment experiment.  In future studies, it might be worthy to conduct the treatment “in the advanced stages of the disease (e.g., ~18-mo)” to further examine the modification potential of the mTORi on the ALP as well as the HTT aggregations. As for the treatment duration, we were interested in an acute treatment schedule given that, in our dosing tests, we observed rapid responses to the treatment (e.g., target engagement) in a few days even with one dose, and that the 14-15-day treatments produced consistent responses (e.g., Fig. S3A). Long-term treatment, however, would be worthy testing in the future although our current study informs a therapeutic approach that has been suggested by others involving intermittent/pulsatile administration of mTOR inhibitors to minimize side effects of chronic long-term administration.

      (3) Is the extent of motor dysfunction in TRGL/Q175 mice comparable to that in Q175 mice? Does the administration of mTORi INK improve these symptoms?

      Unfortunately, we were unable to investigate motor functions experimentally with specific assays such as open field or rotarod tests in this study (partially affected by the falling of the funded research period within the COVID-19 pandemic peak periods in 2020). Based on our experience in handling the mice, we did not notice any obvious differences between Q175 and TRGL/Q175, and any improvements after the acute mTORi INK treatment.  

      (4) Why is eGFP expression not visible in Fig. 6A in TRGL-Veh mice? Additionally, why do normal (non-poly-Q) mice have fewer lysosomes (LY) than TRGL/Q175-INK mice? IHC results also show that CTSD levels are lower in TRGL mice compared to TRGL/Q175-INK mice. Does this suggest lysosome dysfunction in TRGL-Veh mice?

      We appreciate the reviewer raising this point, which has been corrected (through slightly increasing the eGFP signal in the green channel and the merged channels equally for all genotypes), and the revised Fig. 6A is showing better eGFP signals. Regarding higher LY numbers/CTSD levels in TRGL/Q175-INK compared to the control TRGL-Veh mice, it does not necessarily imply LY dysfunction in TRGL mice, rather, it likely suggests mTORi treatment inducing LY biogenesis. Our original characterization of the TRGL mouse of varying ages, where low expression of the tgLC3 construct, produces only a very small increment of total LC3, resulting in no discernable functional changes in the autophagy pathway (Lee, Rao et al. 2019). The underlying mechanism, e.g., TFEB activation following mTOR inhibition, remains to be investigated in future studies. 

      (5) In Figure 5A, the phosphorylation of ATG14 (S29) shows minimal differences in Western blotting, which appears inconsistent with the quantitative results. A similar issue is observed in the quantification of Endo-LC3.

      We welcome the reviewer’s point, and therefore bands showing bigger differences of p-ATG14 (S29) have been used in the revised Fig. 5A, making the images and the quantitative results more consistent and representative. Similar changes have also been made to the Endo-LC3 data at the bottom of Fig. 5A.

      (6) In Figure S2A and Figure S2B, 17-month-old TRGL/Q175 mice show a decrease in pp70S6K and the p-ULK1/ULK1 ratio, but no changes are observed in autophagy-related markers. Do these results indicate only a slight change in autophagy at this stage in TRGL/Q175 mice? Since the mTOR pathway regulates multiple cellular mechanisms, could mTOR also influence other processes? Is it possible that additional mechanisms are involved?

      We completely agree with the reviewer. As mentioned in the text at multiple locations, LAP alterations in Q175 and TRGL/Q175 mice are mild even at a relatively old age (e.g., 17-mo), especially at the protein levels detected by immunoblotting. We agree that even if the mild alterations in the levels of pp70S6K (T389) and p-ULK1/ULK1 ratio may indicate “a slight change in autophagy”, it may also imply that other cell processes are involved given that mTOR signaling regulates multiple cellular functions. In particular, the p70S6K/p-p70S6K – a mTOR substrate used as a readout for mTOR activity in this study – is a key component of the protein synthesis pathway (Wang and Proud 2006, Magnuson, Ekim et al. 2012) , so its changes may serve as readouts for alterations in not only the autophagy pathway, but also the protein synthesis pathway. [A related discussion about mTOR/protein synthesis pathways, in response to a comment from Reviewer 2, has been incorporated into the text under Discussion, L633-640]

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors have explored the beneficial effect of autophagy upregulation in the context of HD pathology in a disease stage-specific manner. The authors have observed functional autophagy lysosomal pathway (ALP) and its machineries at the early stage in the HD mouse model, whereas impairment of ALP has been documented at the later stages of the disease progression. Eventually, the authors took advantage of the operational ALP pathway at the early stage of HD pathology, in order to upregulate ALP and autophagy flux by inhibiting mTORC1 in vivo, which ultimately reverted back to multiple ALP-related abnormalities and phenotypes. Therefore, this manuscript is a promising effort to shed light on the therapeutic interventions with which HD pathology can be treated at the patient level in the future.

      Strengths:

      The study has shown the alteration of ALP in the HD mouse model in a very detailed manner. Such stage-dependent in vivo study will be informative and has not been done before. Also, this research provides possible therapeutic interventions for patients in the future.

      Weaknesses:

      Some constructive comments and suggestions in order to reflect the key aspects and concepts better in the manuscript :

      (1) The authors have observed lysosome number alteration in a temporally regulated disease stage-specific manner. In this scenario investigation of regulation, localization, and level of TFEB, the transcription factor required for lysosome biogenesis, would be interesting and informative.

      We thank the reviewer for this point and completely agree that exploring TFEBrelated aspects would be interesting which will be investigated in future studies. 

      (2) For the general scientific community better clarification of the short forms will be useful. For example, in line 97, page 4, AP full form would be useful. Also 'metabolized via autophagy' can be replaced by 'degraded via autophagy'.

      We appreciate the reviewer for raising this point. We introduced each abbreviation at the location where the full term first appears and, for the case of “AP”, it was introduced in (previous) Line 69 when “autophagosome” first appears. We agree with the reviewer about easy reading for the general scientific community and thus we have added an Abbreviation section after the Key Words section, listing abbreviations used in this manuscript.

      Also, the word “metabolized” has been replaced with “degraded” as suggested. 

      (3) The nuclear vs cytosolic localization of HTT aggregates shown in Figure 2, are very interesting. The increase in cytosolic HTT aggregate formation at 10 months compared to 6 months probably suggests spatio-temporal regulation of aggregate formation. The authors could comment in a more elaborate manner, on the reason and impact of this kind of regulation of aggregate formation in the context of HD pathology.

      We value the reviewer’s important point. Previous studies have well documented that mHTT aggregates exist in both intranuclear and extranuclear locations in the brains of both human HD and mouse models (DiFiglia, Sapp et al. 1997, Li, Li et al. 1999, Carty, Berson et al. 2015, Peng, Wu et al. 2016, Berg, Veeranna et al. 2024). HTT can travel between the nucleus and cytoplasm and the default location for HTT is cytoplasmic, and thus the occurrence of nuclear mHTT aggregates is considered as a result of dysfunction in the nuclear exporting system for proteins (DiFiglia, Sapp et al. 1995, Gutekunst, Levey et al. 1995, Sharp, Loev et al. 1995, Cornett, Cao et al. 2005) while other factors such as phosphorylation of HTT may also affect nuclear targeting (DeGuire, Ruggeri et al. 2018). Extranuclear aggregates of mHTT usually appear later than nuclear aggregates and develop more aggressively in terms of numbers and pace after their appearance (Li, Li et al. 1999, Carty, Berson et al. 2015, Landles, Milton et al. 2020). The fact that there are neurons containing extranuclear aggregates without having nuclear aggregates within the same cells (Carty, Berson et al. 2015) does not support a nuclear-cytoplasmic sequence for aggregate formation, implying different mechanisms controlling the formation of these two types of aggregates. It was reported that there were no significant differences in toxicity associated with the presence of nuclear compared with extranuclear aggregates (Hackam, Singaraja et al. 1999), while other studies have proposed that nuclear aggregates correlate with transcriptional dysfunction while extranuclear aggregates may impair neuronal communication and can track disease progression (Li, Li et al. 1999, Benn, Landles et al. 2005, Landles, Milton et al. 2020). Thus, the observation of a higher level of extranuclear mHTT aggregates at 10-mo compared to 6-mo from the present study is consistent with previous findings mentioned above. In addition, our EM observations of homogenous granular/short fine fibril ultrastructure of both nuclear and extranuclear aggregates are consistent with findings from mouse model studies (Davies, Turmaine et al. 1997, Scherzinger, Lurz et al. 1997), which, interestingly, is different from in vitro studies where nuclear aggregates exhibited a core and shell structure but extranuclear aggregates did not possess the shell (Riguet, Mahul-Mellier et al. 2021), reflecting differences between in vivo and in vitro conditions. Taken together, even if efforts have been made in this and previous studies in trying to understand the differences between nuclear and extranuclear aggregates, the mechanisms regarding the spatial-temporal regulation of aggregate formation have so far not been fully revealed which will require additional investigations.

      (4) In this manuscript, the authors have convincingly shown that mTOR inhibition is inducing autophagy in the HD mouse model in vivo. On the other hand, mTOR inhibition would also reduce overall cellular protein translation. This aspect of mTOR inhibition can also potentially contribute to the alleviation of disease phenotype and disease symptoms by reducing protein overload in HD pathology. The authors' comments regarding this aspect would be appreciated.

      We recognize the value of the reviewer’s point which we completely agree with. Lowering mHTT via interfering protein translation (e.g., through RNAi, antisense oligonucleotides) has been an attractive strategy in HD therapeutic development (Kordasiewicz, Stanek et al. 2012, Tabrizi, Ghosh et al. 2019).  As mentioned above, mTOR regulates multiple cellular pathways including protein synthesis, and inhibition of mTOR as what was done in the present study is potentially affect protein synthesis as well. While our results of decreases in mHTT signals (Fig. 7) can be interpreted as a result of autophagymediated clearance of mHTT, certainly, a possibility cannot be excluded that mTOR inhibition may result in a reduction in HTT production which may also contribute to the observed results – future studies should determine how significant of such a contribution is. [The above description has been incorporated into the text under Discussion, L633-640] 

      (5) The authors have shown nuclear inclusion formation and aggregation of mHTT and also commented on its potential removal with the UPS system (proteasomal degradation) in vivo. As there is also a reciprocal relationship present between autophagy and proteasomal machineries, upon upregulation of autophagy machinery by mTOR inhibition proteasomal activity may decrease. How nuclear proteasomal activity increases to tackle nuclear mHTT IBs, would be interesting to understand in the context of HD pathology. Comments from the authors in this aspect would clarify the role of multiple degradation pathways in handling mutant HTT protein in HD pathology.

      We appreciate the reviewer raising this point. We agree that there are reciprocal relationships between autophagy and the UPS (Korolchuk, Menzies et al. 2010, Park and Cuervo 2013). In general, failure in one pathway would lead to compensatory upregulation of the other pathway, and vice versa (Lee, Park et al. 2019). So, as the reviewer pointed out, “upon upregulation of autophagy machinery by mTOR inhibition proteasomal activity may decrease”. However, we proposed in the Discussion that “It is possible that stimulation of autophagy is reducing the mHTT in the cytoplasm and thereby partially relieves the burden of the proteasome both in the cytoplasm and in the nucleus so that the nuclear proteasome operates more effectively”, which is inconsistent with the general expectation for a decreased UPS activity. However, please note that there are also instances where two pathways may act in the same direction, e.g., autophagy inhibition disturbs UPS degradative function (Korolchuk, Mansilla et al. 2009, Park and Cuervo 2013). Anyhow, our statement is just speculation, requiring verifications with additional experiments in the future. One of the observations reported here which may support the above speculation is the reductions of AV-non-associated form of mHTT/p62/Ub (Fig. 7B3), given that some of them might exist within the nucleus, whose reduced levels may reflect increased intranuclear UPS activity, besides the other possibility that they may travel from the nucleus to the cytosol for clearance as already discussed inside the text. [The last sentence has been incorporated into the text under Discussion, L628-632]

      (6) For the treatment of neurodegenerative disorders taking the temporal regulation into consideration is extremely important, as that will determine the success rate of the treatments in patients. The authors in this manuscript have clearly discussed this scenario. However, for neurodegenerative disordered patients, in most cases, the symptom manifestation is a late onset scenario. In that case, it will be complicated to initiate an early treatment regime in HD patients. If the authors can comment on and discuss the practicality of the early treatment regime for therapeutic purposes that would be impactful.

      We appreciate the reviewer raising this point and we agree with the main concern that “for neurodegenerative disordered patients, in most cases, the symptom manifestation is a late onset scenario.” This is really a common challenge in the therapeutic fields for neurodegeneration diseases. It should be first noted that the current study is an experimental therapeutical attempt in a mouse model which is consistent with previous reports (Ravikumar, Vacher et al. 2004) as a proof of concept for manipulating autophagy (i.e., via inhibiting mTOR in the current setting) as a potential therapeutic, whose clinical practicality requires further verifications. Moreover, in our opinion, early diagnosis (e.g., genetic testing in individuals with higher risk for HD) may be a key in overcoming the above challenges, i.e., if early diagnosis is enabled, it would become possible for earlier interventions. [The above description has been incorporated into the text under Discussion, L654-659] 

      Recommendations for the authors: 

      Reviewer #1 (Recommendations for the authors):

      Minor concerns:

      (1) Figures 1 and 2 should indicate the number of sections and mice/genotypes.

      Thanks for the suggestion, and the info has been added in the figure legends. 

      (2) Figure 3A2 should explain how AP, AL, pa-AL, and LY are quantified.

      Thanks for raising this point. Please note that the quantitation of AP, AL, pa-AL and LY was performed by the hue angle-based analysis which was described under “Confocal image collection and hue angle-based quantitative analysis for AV/LY subtypes” within the Materials and Methods. A phrase “(see the Materials and Methods)” has been added after the existing description “Hue angle-based analysis was performed for AV/LY subtype determination using the methods described in Lee et al., 2019” in the figure legend.

      References

      Benn, C. L., C. Landles, H. Li, A. D. Strand, B. Woodman, K. Sathasivam, S. H. Li, S. Ghazi-Noori, E. Hockly, S. M. Faruque, J. H. Cha, P. T. Sharpe, J. M. Olson, X. J. Li and G. P. Bates (2005). "Contribution of nuclear and extranuclear polyQ to neurological phenotypes in mouse models of Huntington's disease." Hum Mol Genet 14(20): 3065-3078.

      Berg, M. J., Veeranna, C. M. Rosa, A. Kumar, P. S. Mohan, P. Stavrides, D. M. Marchionini, D.S. Yang and R. A. Nixon (2024). "Pathobiology of the autophagy-lysosomal pathway in the Huntington’s disease brain." bioRxiv: 2024.2005.2029.596470.

      Carty, N., N. Berson, K. Tillack, C. Thiede, D. Scholz, K. Kottig, Y. Sedaghat, C. Gabrysiak, G. Yohrling, H. von der Kammer, A. Ebneth, V. Mack, I. Munoz-Sanjuan and S. Kwak (2015). "Characterization of HTT inclusion size, location, and timing in the zQ175 mouse model of Huntington's disease: an in vivo high-content imaging study." PLoS One 10(4): e0123527.

      Cornett, J., F. Cao, C. E. Wang, C. A. Ross, G. P. Bates, S. H. Li and X. J. Li (2005). "Polyglutamine expansion of huntingtin impairs its nuclear export." Nat Genet 37(2): 198204.

      Davies, S. W., M. Turmaine, B. A. Cozens, M. DiFiglia, A. H. Sharp, C. A. Ross, E. Scherzinger, E. E. Wanker, L. Mangiarini and G. P. Bates (1997). "Formation of neuronal intranuclear inclusions underlies the neurological dysfunction in mice transgenic for the HD mutation." Cell 90(3): 537-548.

      DeGuire, S. M., F. S. Ruggeri, M. B. Fares, A. Chiki, U. Cendrowska, G. Dietler and H. A. Lashuel (2018). "N-terminal Huntingtin (Htt) phosphorylation is a molecular switch regulating Htt aggregation, helical conformation, internalization, and nuclear targeting." J Biol Chem 293(48): 18540-18558.

      DiFiglia, M., E. Sapp, K. Chase, C. Schwarz, A. Meloni, C. Young, E. Martin, J. P. Vonsattel, R. Carraway, S. A. Reeves and et al. (1995). "Huntingtin is a cytoplasmic protein associated with vesicles in human and rat brain neurons." Neuron 14(5): 1075-1081.

      DiFiglia, M., E. Sapp, K. O. Chase, S. W. Davies, G. P. Bates, J. P. Vonsattel and N. Aronin (1997). "Aggregation of huntingtin in neuronal intranuclear inclusions and dystrophic neurites in brain." Science 277(5334): 1990-1993.

      Gutekunst, C. A., A. I. Levey, C. J. Heilman, W. L. Whaley, H. Yi, N. R. Nash, H. D. Rees, J. J. Madden and S. M. Hersch (1995). "Identification and localization of huntingtin in brain and human lymphoblastoid cell lines with anti-fusion protein antibodies." Proc Natl Acad Sci U S A 92(19): 8710-8714.

      Hackam, A. S., R. Singaraja, T. Zhang, L. Gan and M. R. Hayden (1999). "In vitro evidence for both the nucleus and cytoplasm as subcellular sites of pathogenesis in Huntington's disease." Hum Mol Genet 8(1): 25-33.

      Kordasiewicz, H. B., L. M. Stanek, E. V. Wancewicz, C. Mazur, M. M. McAlonis, K. A. Pytel, J. W. Artates, A. Weiss, S. H. Cheng, L. S. Shihabuddin, G. Hung, C. F. Bennett and D. W. Cleveland (2012). "Sustained therapeutic reversal of Huntington's disease by transient repression of huntingtin synthesis." Neuron 74(6): 1031-1044.

      Korolchuk, V. I., A. Mansilla, F. M. Menzies and D. C. Rubinsztein (2009). "Autophagy inhibition compromises degradation of ubiquitin-proteasome pathway substrates." Mol Cell 33(4): 517-527.

      Korolchuk, V. I., F. M. Menzies and D. C. Rubinsztein (2010). "Mechanisms of cross-talk between the ubiquitin-proteasome and autophagy-lysosome systems." FEBS Lett 584(7): 1393-1398.

      Landles, C., R. E. Milton, N. Ali, R. Flomen, M. Flower, F. Schindler, C. Gomez-Paredes, M. K. Bondulich, G. F. Osborne, D. Goodwin, G. Salsbury, C. L. Benn, K. Sathasivam, E. J. Smith, S. J. Tabrizi, E. E. Wanker and G. P. Bates (2020). "Subcellular Localization And Formation Of Huntingtin Aggregates Correlates With Symptom Onset And Progression In A Huntington'S Disease Model." Brain Commun 2(2): fcaa066.

      Lee, J. H., S. Park, E. Kim and M. J. Lee (2019). "Negative-feedback coordination between proteasomal activity and autophagic flux." Autophagy 15(4): 726-728.

      Lee, J. H., M. V. Rao, D. S. Yang, P. Stavrides, E. Im, A. Pensalfini, C. Huo, P. Sarkar, T. Yoshimori and R. A. Nixon (2019). "Transgenic expression of a ratiometric autophagy probe specifically in neurons enables the interrogation of brain autophagy in vivo." Autophagy 15(3): 543-557.

      Lee, J. H., D. S. Yang, C. N. Goulbourne, E. Im, P. Stavrides, A. Pensalfini, H. Chan, C. Bouchet-Marquis, C. Bleiwas, M. J. Berg, C. Huo, J. Peddy, M. Pawlik, E. Levy, M. Rao, M. Staufenbiel and R. A. Nixon (2022). "Faulty autolysosome acidification in Alzheimer's disease mouse models induces autophagic build-up of Abeta in neurons, yielding senile plaques." Nat Neurosci 25(6): 688-701.

      Li, H., S. H. Li, A. L. Cheng, L. Mangiarini, G. P. Bates and X. J. Li (1999). "Ultrastructural localization and progressive formation of neuropil aggregates in Huntington's disease transgenic mice." Hum Mol Genet 8(7): 1227-1236.

      Magnuson, B., B. Ekim and D. C. Fingar (2012). "Regulation and function of ribosomal protein S6 kinase (S6K) within mTOR signalling networks." Biochem J 441(1): 1-21.

      Park, C. and A. M. Cuervo (2013). "Selective autophagy: talking with the UPS." Cell Biochem Biophys 67(1): 3-13.

      Peng, Q., B. Wu, M. Jiang, J. Jin, Z. Hou, J. Zheng, J. Zhang and W. Duan (2016). "Characterization of Behavioral, Neuropathological, Brain Metabolic and Key Molecular Changes in zQ175 Knock-In Mouse Model of Huntington's Disease." PLoS One 11(2): e0148839.

      Ravikumar, B., C. Vacher, Z. Berger, J. E. Davies, S. Luo, L. G. Oroz, F. Scaravilli, D. F. Easton, R. Duden, C. J. O'Kane and D. C. Rubinsztein (2004). "Inhibition of mTOR induces autophagy and reduces toxicity of polyglutamine expansions in fly and mouse models of Huntington disease." Nat Genet 36(6): 585-595.

      Riguet, N., A. L. Mahul-Mellier, N. Maharjan, J. Burtscher, M. Croisier, G. Knott, J. Hastings, A. Patin, V. Reiterer, H. Farhan, S. Nasarov and H. A. Lashuel (2021). "Nuclear and cytoplasmic huntingtin inclusions exhibit distinct biochemical composition, interactome and ultrastructural properties." Nat Commun 12(1): 6579.

      Scherzinger, E., R. Lurz, M. Turmaine, L. Mangiarini, B. Hollenbach, R. Hasenbank, G. P. Bates, S. W. Davies, H. Lehrach and E. E. Wanker (1997). "Huntingtin-encoded polyglutamine expansions form amyloid-like protein aggregates in vitro and in vivo." Cell 90(3): 549-558.

      Sharp, A. H., S. J. Loev, G. Schilling, S. H. Li, X. J. Li, J. Bao, M. V. Wagster, J. A. Kotzuk, J. P. Steiner, A. Lo and et al. (1995). "Widespread expression of Huntington's disease gene (IT15) protein product." Neuron 14(5): 1065-1074.

      Tabrizi, S. J., R. Ghosh and B. R. Leavitt (2019). "Huntingtin Lowering Strategies for Disease Modification in Huntington's Disease." Neuron 101(5): 801-819.

      Wang, X. and C. G. Proud (2006). "The mTOR pathway in the control of protein synthesis." Physiology (Bethesda) 21: 362-369.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This study offers a valuable investigation into the role of cholecystokinin (CCK) in thalamocortical plasticity during early development and adulthood, employing a range of experimental techniques. The authors demonstrate that tetanic stimulation of the auditory thalamus induces cortical long-term potentiation (LTP), which can be evoked through either electrical or optical stimulation of the thalamus or by noise bursts. They further show that thalamocortical LTP is abolished when thalamic CCK is knocked down or when cortical CCK receptors are blocked. Interestingly, in 18-month-old mice, thalamocortical LTP was largely absent but could be restored through the cortical application of CCK. The authors conclude that CCK contributes to thalamocortical plasticity and may enhance thalamocortical plasticity in aged subjects.

      While the study presents compelling evidence, I would like to offer several suggestions for the authors' consideration:

      (1) Thalamocortical LTP and NMDA-Dependence:

      It is well established that thalamocortical LTP is NMDA receptor-dependent, and blocking cortical NMDA receptors can abolish LTP. This raises the question of why thalamocortical LTP is eliminated when thalamic CCK is knocked down or when cortical CCK receptors are blocked. If I correctly understand the authors' hypothesis - that CCK promotes LTP through CCKR-intracellular Ca2+-AMPAR. This pathway should not directly interfere with the NMDA-dependent mechanism. A clearer explanation of this interaction would be beneficial.

      Thank you for your question regarding the role of CCK and NMDA receptors (NMDARs) in thalamocortical LTP. We propose that CCK receptor (CCKR) activation enhances intracellular calcium levels, which are crucial for thalamocortical LTP induction. Calcium influx through NMDARs is also essential to reach the threshold required for activating downstream signaling pathways that promote LTP (Heynen and Bear, 2001). Thus, CCKRs and NMDARs may function in a complementary manner to facilitate LTP, with both contributing to the elevation of intracellular calcium.

      However, it is important to note that the postsynaptic mechanisms of thalamocortical LTP in the auditory cortex (ACx) differ from those in other sensory cortices. Studies have shown that thalamocortical LTP in the ACx appears to be less dependent on NMDARs (Chun et al., 2013), which is distinct from somatosensory or visual cortices. Our previous studies also found that while NMDAR antagonists can block HFS-induced LTP in the inner ACx, LTP can still be induced in the presence of CCK even after the NMDARs blockade (Chen et al. 2019). These findings suggest that CCK may act through an alternative mechanism involving CCKR-mediated calcium signaling and AMPAR modulation, which partially compensates for the loss of NMDAR signaling. This distinction may reflect functional differences between the ACx and other sensory cortices, as highlighted in previous studies (King and Nelken, 2009).

      While our current study focuses on the role of CCKR-mediated plasticity in the auditory system, further investigations are needed to elucidate how CCKRs and NMDARs interact within the broader framework of thalamocortical neuroplasticity across different cortical regions. Understanding whether similar mechanisms operate in other sensory systems, such as the visual cortex, will be an important direction for future research.

      Heynen, A.J., and Bear, M.F. (2001). Long-term potentiation of thalamocortical transmission in the adult visual cortex in vivo. J Neurosci 21, 9801-9813. 10.1523/jneurosci.21-24-09801.2001.

      Chun, S., Bayazitov, I.T., Blundon, J.A., and Zakharenko, S.S. (2013). Thalamocortical Long-Term Potentiation Becomes Gated after the Early Critical Period in the Auditory Cortex. The Journal of Neuroscience 33, 7345-7357. 10.1523/jneurosci.4500-12.2013.

      Chen, X., Li, X., Wong, Y.T., Zheng, X., Wang, H., Peng, Y., Feng, H., Feng, J., Baibado, J.T., Jesky, R., et al. (2019). Cholecystokinin release triggered by NMDA receptors produces LTP and sound-sound associative memory. Proc Natl Acad Sci U S A 116, 6397-6406. 10.1073/pnas.1816833116.

      King, A. J., & Nelken, I. (2009). Unraveling the principles of auditory cortical processing: can we learn from the visual system? Nature neuroscience, 12(6), 698-701.

      (2) Complexity of the Thalamocortical System:

      The thalamocortical system is intricate, with different cortical and thalamic subdivisions serving distinct functions. In this study, it is not fully clear which subdivisions were targeted for stimulation and recording, which could significantly influence the interpretation of the findings. Clarifying this aspect would enhance the study's robustness.

      Thank you for your valuable feedback. We would like to clarify that stimulation was conducted in the medial geniculate nucleus ventral (MGv), and recording was performed in layer IV of the ACx. Targeting the MGv allows us to investigate the influence of thalamic inputs on auditory cortical responses. Layer IV of the ACx is known to receive direct thalamic projections, making it an ideal site for assessing how thalamic activity influences cortical processing. We will incorporate this clarification into the revised manuscript to enhance the robustness of our study.

      Results section:

      “Stimulation electrodes were placed in the MGB (specifically in the medial geniculate nucleus ventral subdivision, MGv), and recording electrodes were inserted into layer IV of ACx”

      “The recording electrodes were lowered into layer IV of ACx, while the stimulation electrodes were lowered into MGB (MGv subdivision). The final stimulating and recording positions were determined by maximizing the cortical fEPSP amplitude triggered by the ES in the MGB. The accuracy of electrode placement was verified through post-hoc histological examination and electrophysiological responses.”

      (3) Statistical Variability:

      Biological data, including field excitatory postsynaptic potentials (fEPSPs) and LTP, often exhibit significant variability between samples, sometimes resulting in a standard deviation that exceeds 50% of the mean value. The reported standard deviation of LTP in this study, however, appears unusually small, particularly given the relatively limited sample size. Further discussion of this observation might be warranted.

      Thank you for your question. In our experiments, the sample size N represents the number of animals used, while n refers to the number of recordings, with each recording corresponding to a distinct stimulation and recording sites. To adhere to ethical guidelines and minimize animal usage, we often perform multiple recordings within a single animal, such as from different hemispheres of the brain. Although N may appear small, our statistical analyses are based on n, ensuring sufficient data points for reliable conclusions.

      Furthermore, as our experiments are conducted in vivo, we observe lower variability in the increase of fEPSP slopes following LTP induction compared to brain slice preparations, where standard deviations exceeding 50% of the mean are common. This reduced variability likely reflects the robustness of the physiologically intact conditions in the in vivo setup.

      (4) EYFP Expression and Virus Targeting:

      The authors indicate that AAV9-EFIa-ChETA-EYFP was injected into the medial geniculate body (MGB) and subsequently expressed in both the MGB and cortex. If I understand correctly, the authors assume that cortical expression represents thalamocortical terminals rather than cortical neurons. However, co-expression of CCK receptors does not necessarily imply that the virus selectively infected thalamocortical terminals. The physiological data regarding cortical activation of thalamocortical terminals could be questioned if the cortical expression represents cortical neurons or both cortical neurons and thalamocortical terminals.

      Thank you for your question. In Figure 2A, EYFP expression indicates thalamocortical projections, while the co-expression of EYFP with PSD95 confirms the identity of thalamocortical terminals. The CCK-B receptors (CCKBR) are located on postsynaptic cortical neurons. The observed co-labeling of thalamocortical terminals and postsynaptic CCKBR suggests that CCK-expressing neurons in the medial geniculate body (MGB) can release CCK, which subsequently acts on the postsynaptic CCKBR. This evidence supports our interpretation of the functional role of CCK modulating neural plasticity between thalamocortical inputs and cortical neurons. As shown in Figure 2A, we aim to demonstrate that the co-labeling of thalamocortical terminals with CCK receptors accounts for a substantial proportion of the thalamocortical terminals. We will ensure that this clarification is emphasized in the revised manuscript to address your concerns.

      Results section:

      “Cre-dependent AAV9-EFIa-DIO-ChETA-EYFP was injected into the MGB of CCK-Cre mice. EYFP labeling marked CCK-positive neurons in the MGB. The co-expression of EYFP thalamocortical projections with PSD95 confirms the identity of thalamocortical terminals (yellow), which primarily targeted layer IV of the ACx (Figure 2A, upper panel). Immunohistochemistry revealed that a substantial proportion (15 out of 19, Figure 2A lower right panel) of thalamocortical terminals (arrows) colocalize with CCK receptors (CCKBR) on postsynaptic cortical neurons in the ACx (Figure 2A lower panel), supporting the functional role of CCK in modulating thalamocortical plasticity.”

      (5) Consideration of Previous Literature:

      A number of studies have thoroughly characterized auditory thalamocortical LTP during early development and adulthood. It may be beneficial for the authors to integrate insights from this body of work, as reliance on data from the somatosensory thalamocortical system might not fully capture the nuances of the auditory pathway. A more comprehensive discussion of the relevant literature could enhance the study's context and impact.

      Thank you for your valuable feedback. We will enhance our discussion on auditory thalamocortical LTP during early development and adulthood to provide a more comprehensive context for our study.

      (6) Therapeutic Implications:

      While the authors suggest potential therapeutic applications of their findings, it may be somewhat premature to draw such conclusions based on the current evidence. Although speculative discussion is not harmful, it may not significantly add to the study's conclusions at this stage.

      Thank you for your thoughtful feedback. We agree that the therapeutic applications mentioned in our study are speculative at this stage and should be regarded as a forward-looking perspective rather than definitive conclusions. Our intention was to highlight the broader potential of our findings to inspire further research, rather than to propose immediate clinical applications.

      In light of your feedback, we have adjusted the language in the manuscript to reflect a more cautious interpretation. Speculative discussions are now explicitly framed as hypotheses or possibilities for future exploration. We emphasize that our findings provide a foundation for further investigations into CCK-based plasticity and its implications.

      We believe that appropriately framed forward-thinking discussions are valuable in guiding the direction of future research. We sincerely hope that our current and future work will contribute to a deeper understanding of thalamocortical plasticity and, over time, potentially lead to advancements in human health.

      Reviewer #2 (Public review):

      Summary:

      This work used multiple approaches to show that CCK is critical for long-term potentiation (LTP) in the auditory thalamocortical pathway. They also showed that the CCK mediation of LTP is age-dependent and supports frequency discrimination. This work is important because it opens up a new avenue of investigation of the roles of neuropeptides in sensory plasticity.

      Strengths:

      The main strength is the multiple approaches used to comprehensively examine the role of CCK in auditory thalamocortical LTP. Thus, the authors do provide a compelling set of data that CCK mediates thalamocortical LTP in an age-dependent manner.

      Weaknesses:

      The behavioral assessment is relatively limited but may be fleshed out in future work.

      Reviewer #3 (Public review):

      Summary:

      Cholecystokinin (CCK) is highly expressed in auditory thalamocortical (MGB) neurons and CCK has been found to shape cortical plasticity dynamics. In order to understand how CCK shapes synaptic plasticity in the auditory thalamocortical pathway, they assessed the role of CCK signaling across multiple mechanisms of LTP induction with the auditory thalamocortical (MGB - layer IV Auditory Cortex) circuit in mice. In these physiology experiments that leverage multiple mechanisms of LTP induction and a rigorous manipulation of CCK and CCK-dependent signaling, they establish an essential role of auditory thalamocortical LTP on the co-release of CCK from auditory thalamic neurons. By carefully assessing the development of this plasticity over time and CCK expression, they go on to identify a window of time that CCK is produced throughout early and middle adulthood in auditory thalamocortical neurons to establish a window for plasticity from 3 weeks to 1.5 years in mice, with limited LTP occurring outside of this window. The authors go on to show that CCK signaling and its effect on LTP in the auditory cortex is also capable of modifying frequency discrimination accuracy in an auditory PPI task. In evaluating the impact of CCK on modulating PPI task performance, it also seems that in mice <1.5 years old CCK-dependent effects on cortical plasticity are almost saturated. While exogenous CCK can modestly improve discrimination of only very similar tones, exogenous focal delivery of CCK in older mice can significantly improve learning in a PPI task to bring their discrimination ability in line with those from young adult mice.

      Strengths:

      (1) The clarity of the results along with the rigor multi-angled approach provide significant support for the claim that CCK is essential for auditory thalamocortical synaptic LTP. This approach uses a combination of electrical, acoustic, and optogenetic pathway stimulation alongside conditional expression approaches, germline knockout, viral RNA downregulation, and pharmacological blockade. Through the combination of these experimental configures the authors demonstrate that high-frequency stimulation-induced LTP is reliant on co-release of CCK from glutamatergic MGB terminals projecting to the auditory cortex.

      (2) The careful analysis of the CCK, CCKB receptor, and LTP expression is also a strength that puts the finding into the context of mechanistic causes and potential therapies for age-dependent sensory/auditory processing changes. Similarly, not only do these data identify a fundamental biological mechanism, but they also provide support for the idea that exogenous asynchronous stimulation of the CCKBR is capable of restoring an age-dependent loss in plasticity.

      (3) Although experiments to simultaneously relate LTP and behavioral change or identify a causal relationship between LTP and frequency discrimination are not made, there is still convincing evidence that CCK signaling in the auditory cortex (known to determine synaptic LTP) is important for auditory processing/frequency discrimination. These experiments are key for establishing the relevance of this mechanism.

      Weaknesses:

      (1) Given the magnitude of the evoked responses, one expects that pyramidal neurons in layer IV are primarily those that undergo CCK-dependent plasticity, but the degree to which PV-interneurons and pyramidal neurons participate in this process differently is unclear.

      Thank you for this insightful comment. We agree that the differential roles of PV-interneurons and pyramidal neurons in CCK-dependent thalamocortical plasticity remain unclear and acknowledge this as an important limitation of our study. Our primary focus was on pyramidal neurons, as our in vivo electrophysiological recordings measured the fEPSP slope in layer IV of the auditory cortex, which primarily reflects excitatory synaptic activity. However, we recognize the critical role of the excitatory-inhibitory balance in cortical function and the potential contribution of PV-interneurons to this process. In future studies, we plan to utilize techniques such as optogenetics, two-photon calcium imaging and cell-type-specific recordings to investigate the distinct contributions of PV-interneurons and pyramidal neurons to CCK-dependent thalamocortical plasticity, thereby providing a more comprehensive understanding of how CCK modulates thalamocortical circuits.

      (2) While these data support an important role for CCK in synaptic LTP in the auditory thalamocortical pathway, perhaps temporal processing of acoustic stimuli is as or more important than frequency discrimination. Given the enhanced responsivity of the system, it is unclear whether this mechanism would improve or reduce the fidelity of temporal processing in this circuit. Understanding this dynamic may also require consideration of cell type as raised in weakness #1.

      Thank you for this thoughtful comment. We acknowledge that our study did not directly address the fidelity of temporal processing, which is indeed a critical aspect of auditory function. Our behavioral experiments primarily focused on linking frequency discrimination to the role of CCK in synaptic strengthening within the auditory thalamocortical pathway. However, we agree that enhanced responsivity of the system could also impact temporal processing dynamics, such as the precise timing of auditory responses. Whether this modulation improves or reduces the fidelity of temporal processing remains an open and important question.

      As you noted, understanding these dynamics will require a deeper investigation into the interactions between different cell types, particularly the balance between excitatory and inhibitory neurons. Exploring how CCK modulation affects both the circuit and cellular levels in temporal processing is an important direction for future research, which we plan to pursue. Thank you again for raising this important point.

      Disscusion section:

      “While we focused on homosynaptic plasticity at thalamocortical synapses by recording only fEPSPs in layer IV of ACx, it is essential to further explore heterosynaptic effects of CCK released from thalamocortical synapses on intracortical circuits, particularly its role in modulating the excitatory-inhibitory balance. PV-interneurons, as key regulators of cortical inhibition, may contribute to the temporal fidelity of sensory processing, which is critical for auditory perception (Nocon et al., 2023; Cai et al., 2018). Additionally, CCK may facilitate cross-modal plasticity by modulating heterosynaptic plasticity in interconnected cortical areas. Future studies would provide valuable insights into the broader role of CCK in shaping sensory processing and cortical network dynamics.”

      Nocon, J.C., Gritton, H.J., James, N.M., Mount, R.A., Qu, Z., Han, X., and Sen, K. (2023). Parvalbumin neurons enhance temporal coding and reduce cortical noise in complex auditory scenes. Communications Biology 6, 751. 10.1038/s42003-023-05126-0.

      Cai, D., Han, R., Liu, M., Xie, F., You, L., Zheng, Y., Zhao, L., Yao, J., Wang, Y., Yue, Y., et al. (2018). A Critical Role of Inhibition in Temporal Processing Maturation in the Primary Auditory Cortex. Cereb Cortex 28, 1610-1624. 10.1093/cercor/bhx057.

      (3) In Figure 1, an example of increased spontaneous and evoked firing activity of single neurons after HFS is provided. Yet it is surprising that the group data are analyzed only for the fEPSP. It seems that single-neuron data would also be useful at this point to provide insight into how CCK and HFS affect temporal processing and spontaneous activity/excitability, especially given the example in 1F.

      Thank you for your insightful comment. In our in vivo electrophysiological experiments on LTP induction, we recorded neural activity for over 1.5 hours to assess changes in neuronal responses over time, both prior to and following the induction. While single neuron firing data can provide valuable insights, such measurements are inherently more variable due to factors like cortical state fluctuations and the condition of nearby neurons, which makes them less reliable for long-term analysis. For this reason, we focused on fEPSP, as it offers a more stable and robust readout of synaptic activity over extended periods.

      We appreciate your suggestion and recognize the value of single-neuron data in understanding how CCK and HFS affect temporal processing and excitability. In future studies, we will consider to incorporate single-neuron analyses to complement our synaptic-level findings and provide a more comprehensive understanding of these mechanisms.

      (4) The authors mention that CCK mRNA was absent in CCK-KO mice, but the data are not provided.

      Thank you for your comment. Data from the CCK-KO mice are presented in Figure 3A (far right) and in the upper panel of Figure 3B (far right). In the lower panel of Figure 3B, data from the CCK-KO group are not shown because the normalized values for this group were essentially zero, as expected due to the absence of CCK mRNA.

      (5) The circuitry that determines PPI requires multiple brain areas, including the auditory cortex. Given the complicated dynamics of this process, it may be helpful to consider what, if anything, is known specifically about how layer IV synaptic plasticity in the auditory cortex may shape this behavior.

      Thank you for raising this important point. Pre-pulse inhibition (PPI) of the acoustic startle response indeed involves multiple brain regions, with the ascending auditory pathway playing a key role (Gómez-Nieto et al., 2020). Within the auditory cortex, layer IV neurons receive tonotopically organized inputs from the medial geniculate nucleus and are critical for integrating thalamic inputs and shaping auditory processing.

      In our behavioral experiments, mice were required to discriminate pre-pulses of varying frequencies against a continuous background sound. Given the role of auditory cortical neurons in integrating thalamic inputs and shaping auditory processing, it is likely that synaptic plasticity in these neurons contributes to the enhanced discrimination of pre-pulses. Supporting this idea, our previous work demonstrated that local infusion of CCK, paired with weak acoustic stimuli, significantly increased auditory responses in the auditory cortex (Li et al., 2014). In the current study, we further showed that CCK release during high-frequency stimulation of the thalamocortical pathway induced LTP in layer IV of the auditory cortex. Together, these findings suggest that CCK-dependent synaptic plasticity in layer IV may amplify the cortical representation of weak auditory inputs, thereby improving pre-pulses detection and enhancing PPI performance.

      It is also worth noting that aged mice with hearing loss typically exhibit PPI deficits due to impaired auditory processing (Ouagazzal et al., 2006 and Young et al., 2010). We propose that enhanced plasticity in the thalamocortical pathway, mediated by CCK, might partially compensate for these deficits by amplifying residual auditory signals in aged mice. However, the precise mechanisms by which layer IV synaptic plasticity modulates PPI behavior remain to be fully understood. Given the complex dynamics of sensory processing, future studies could explore how layer IV neurons interact with other cortical and subcortical circuits involved in PPI, as well as the specific contributions of excitatory and inhibitory cell types. These investigations will help provide a more comprehensive understanding of the role of CCK in modulating sensory gating and auditory processing.

      Gómez-Nieto, R., Hormigo, S., & López, D. E. (2020). Prepulse inhibition of the auditory startle reflex assessment as a hallmark of brainstem sensorimotor gating mechanisms. Brain sciences, 10(9), 639.

      Li, X., Yu, K., Zhang, Z., Sun, W., Yang, Z., Feng, J., Chen, X., Liu, C.-H., Wang, H., Guo, Y.P., and He, J. (2014). Cholecystokinin from the entorhinal cortex enables neural plasticity in the auditory cortex. Cell Research 24, 307-330. 10.1038/cr.2013.164.

      Ouagazzal, A. M., Reiss, D., & Romand, R. (2006). Effects of age-related hearing loss on startle reflex and prepulse inhibition in mice on pure and mixed C57BL and 129 genetic background. Behavioural brain research, 172(2), 307-315.

      Young, J. W., Wallace, C. K., Geyer, M. A., & Risbrough, V. B. (2010). Age-associated improvements in cross-modal prepulse inhibition in mice. Behavioral neuroscience, 124(1), 133.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Major concerns:

      (1) In Figure 1, the authors used different metrics for fEPSP strength. In Figure 1D, the authors used the slope, while they used the amplitude in Figure 1G. It is known that the two metrics are different from each other. While the slope is calculated from the linear regression between the voltage change per time of the rising phase of the fEPSP, the amplitude represents the voltage value of the fEPSP's peak. Please clarify here and in the method what metric you used, because the two terms are not interchangeable.

      Thank you for pointing out this oversight in our manuscript. We confirm that we used the slope of the fEPSP as the metric for assessing synaptic strength throughout the study, including both Figure 1D and Figure 1G. We will make the necessary corrections to ensure clarity and consistency. Thank you for bringing this to our attention.

      (2) It is not mentioned in the details of the methods about the CCK-KO mice. Please give such details. Although the authors used the CCK-KO mouse model as a control, I think that it is not a good choice to test the hypothesis mentioned in lines 165 and 166. The experiment was supposed to monitor the CCK-BR activity after HFS of the MGB and answer whether the CCK-BR will get activated by thalamic stimulation, but the CCK-KO mouse does not have CCK to be released after the optogenetic activation of the Chrimson probe. Therefore, it is expected to give nothing as if the experimenter runs an experiment without intervention. I think that the appropriate way to examine the hypothesis is to compare mice that were either injected with AAV9-Syn-FLEX-ChrimsonR-tdTomato or AAV9-Syn-FLEX-tdTomato. However, CCK-OK would be a perfect model to confirm that LTP can be only generated dependently on CCK, by simply running the HFS of the MGB that would be associated with the cortical recording of the fEPSP. This also will rule out the assumption that the authors mentioned in lines 191 and 192.

      Thank you for your valuable feedback. The rationale behind our experimental design was to validate the newly developed CCK sensor and confirm its specificity. We aimed to verify CCK release post-HFS by comparing the responses of the CCK sensor in CCK-KO mice and CCK-Cre mice. This comparison allowed us to determine that the observed increase in fluorescence intensity post-HFS was specifically due to CCK release, rather than other neurotransmitters induced by HFS.

      We appreciate your suggestion to compare mice injected with AAV9-Syn-FLEX-ChrimsonR-tdTomato and AAV9-Syn-FLEX-tdTomato, as it is indeed a valuable approach for directly testing the hypothesis regarding CCK-BR activation. However, we prioritized using the CCK-KO model to validate the CCK sensor's efficacy and specificity. The validation can be inferred by comparing the CCK sensor activity before and after HFS.

      Regarding concerns mentioned in lines 191 and 192 about potential CCK release from other projections via indirect polysynaptic activation, CCK-KO mice were not suitable for this aspect due to their global knockout of CCK. To address this limitation, we utilized shRNA to specifically down-regulate Cck expression in MGB neurons. This approach focused on the necessity of CCK released from thalamocortical projections for the observed LTP and effectively ruled out the possibility of indirect polysynaptic activation.

      We also acknowledge that the methods section lacked sufficient details about the CCK-KO mice, which may have caused confusion. In the revised methods section, we will add the following details:

      (1) The genotype of the CCK-KO mice used in this study (CCK-ires-CreERT2, Jax#012710).

      (2) A brief description of the CCK-KO validation, emphasizing the absence of CCK mRNA in these mice (as shown in Figure 3A and 3B).

      (3) The experimental purpose of using CCK-KO mice to validate the specificity of the CCK sensor.

      We believe these additions will clarify the rationale for using CCK-KO mice and their role in this study. Thank you again for highlighting these important points.

      (3) Figure 3C: The authors should examine if there is a difference in the baseline of fEPSPs across different age groups as the dependence on the normalization in the analysis within each group would hide if there were any difference of the baseline slope of fEPSP between groups which could be related to any misleading difference after HFS. Also, I wonder about the absence of LTP in P20, which is a closer age to the critical period. Could the authors discuss that, please?

      Thank you for your insightful feedback. To address your concern regarding baseline differences in fEPSP slopes across age groups, we conducted additional analysis. Baseline fEPSP across the three groups (P20, 8w, 18m), normalized to the 8w group, were 64.8± 13.1%, 100.0 ± 20.4%, and 58.8± 10.3%, respectively. While there was a trend suggesting smaller fEPSP slopes in the P20 and 18m groups compared to the young adult group, these differences were not statistically significant due to data variability (P20 vs. 8w, P = 0.319; 8w vs. 18m, P=0.147; P20 vs. 18m, P = 1.0, one-way ANOVA). These results suggest that baseline variability is unlikely to confound the observed differences in LTP after HFS. Furthermore, we ensured that normalization minimized any potential baseline effects.

      Regarding the absence of LTP in P20, this likely reflects developmental regulation of CCKBR expression in the auditory cortex (ACx). The HFS-induced thalamocortical LTP observed in our study is CCK-dependent and mechanistically distinct from the NMDA-dependent thalamocortical LTP during the critical period. Specifically, correlated pre- and postsynaptic activity can induce NMDA-dependent thalamocortical LTP only during an early critical period corresponding to the first several postnatal days, after which this pairing becomes ineffective starting from the second postnatal week (Crair and Malenka, 1995; Isaac et al., 1997; Chun et al., 2013). In contrast, the CCK-dependent Thalamocortical LTP induced by HFS is robust in adult mice but appears absent in P20, likely due to the lack of postsynaptic CCKBR expression in the ACx at this developmental stage.

      We will include these clarifications in the revised manuscript, particularly in the Discussion section, to provide a more comprehensive explanation of our findings. Thank you for your valuable comments and suggestions.

      Crair, M.C., and Malenka, R.C. (1995). A critical period for long-term potentiation at thalamocortical synapses. Nature 375, 325-328. 10.1038/375325a0.

      Isaac, J.T.R., Crair, M.C., Nicoll, R.A., and Malenka, R.C. (1997). Silent Synapses during Development of Thalamocortical Inputs. Neuron 18, 269-280. https://doi.org/10.1016/S0896-6273(00)80267-6.

      Chun, S., Bayazitov, I.T., Blundon, J.A., and Zakharenko, S.S. (2013). Thalamocortical Long-Term Potentiation Becomes Gated after the Early Critical Period in the Auditory Cortex. The Journal of Neuroscience 33, 7345-7357. 10.1523/jneurosci.4500-12.2013.

      (4) Figure 4F: It is noticed that the baseline fEPSP of the CCK group and ACSF groups were different, which raises a concern about the baseline differences between treatment groups.

      Thank you for your valuable feedback and for pointing out this important detail. We apologize for any confusion caused by the presentation of the data. As noted in the figure legend, the scale bars for the fEPSPs were different between the left (0.1 mV) and right panels (20 µV). This difference in scale may have created the perception of baseline differences between the CCK and ACSF groups. To enhance clarity and avoid potential misunderstanding, we will unify the scale bar values in the revised figure. This adjustment will provide a clearer and more accurate comparison of fEPSPs between groups. Thank you again for bringing this issue to our attention.

      (5) From Figure S2D, it seems that different animals were injected with the drug and ACSF. Therefore, how the authors validate the position of the recording electrode to the cortical area of certain CF and relative EF. Also, there is not enough information about the basis of the selection of the EF. Should it be lower than the CF with a certain value? Was the EF determined after the initial tuning curve in each case? To mitigate this difference, it would be appropriate if the authors examined the presence of a significant difference in the tuning width and CFs between animals exposed to ACSF and CCK-4. This will give some validation of a balanced experiment between ACSF and CCK-4. I wonder also why the authors used rats here not mice, as it will be easier to interpret the results came from the same species.

      Thank you for your thoughtful comments. The effective frequency (EF) was determined after measuring the initial tuning curve for each case. The EF was selected to elicit a clear sound response while maintaining a sufficient distance from the characteristic frequency (CF) to allow measurable increases in response intensity. Specifically, EF was selected based on the starting point of the tuning peak, which corresponds to the onset of its fastest rising phase. From this point, EF was determined by moving 0.2 or 0.4 octaves toward the CF. While there were individual differences in EF selection among animals, the methodology for determining EF was standardized and applied consistently across both the ACSF and CCK-4 groups.

      Regarding the use of rats in these experiments, these studies were conducted prior to our current work with mice. The findings in rat provide valuable insights that support our current results in mice. Since the rat data are supplementary to the primary findings, we included them as supplementary material to provide additional context and validation. Furthermore, in consideration of animal welfare, we chose not to replicate these experiments in mice, as the findings from rats were sufficient to support our conclusions.

      Methods section:

      “The tuning curve was determined by plotting the lowest intensity at which the neuron responded to different tones. The characteristic frequency (CF) is defined as the frequency corresponding to the lowest point on this curve. The effective frequency (EF) was determined to elicit a clear sound response while maintaining a sufficient distance from the CF to allow measurable increases in response intensity. Specifically, EF was selected based on the starting point of the tuning peak, which corresponds to the onset of its fastest rising phase. From this point, EF was determined by moving 0.2 or 0.4 octaves toward the CF.”

      (6) Lines 384-386: There are no figures named 5H and I.

      Thank you for pointing this out. The references to Figures 5H and 5I were incorrect and should have referred to Figures 5C and 5D. We sincerely apologize for this oversight and will correct these errors in the revised manuscript to ensure clarity and accuracy. Thank you again for bringing this to our attention.

      (7) The authors should mention the sex of the animals used.

      Thank you for your comment and for highlighting this important detail. The sex of the animals used in this study is specified in the Animals section of the Methods: "In the present study, male mice and rats were used to investigate thalamocortical LTP." We appreciate your careful attention to this point and will ensure that this detail remains clearly stated in the manuscript.

      (8) Lines 534 and 648: These coordinates are difficult to understand. Since the experiment was done on both mice and rats, we need a clear description of the coordinates in both. Also, I think that you should mention the lateral distance from the sagittal suture as the ventral coordinates should be calculated from the surface of the skull above the AC and not from the sagittal suture.

      Thank you for your valuable feedback and for pointing out this important issue. We apologize for any confusion caused by our description of the coordinates. The term “ventral” was deliberately used because the auditory cortex is located on the lateral side of the skull, which may have caused some misunderstanding.

      To provide a clearer and more accurate descriptions of the coordinates, we will revise the text in the manuscript as follows: “A craniotomy was performed at the temporal bone (-2 to -4 mm posterior and -1.5 to -3 mm ventral to bregma for mice; -3.0 to -5.0 mm posterior and -2.5 to -6.5 mm ventral to bregma for rats) to access the auditory cortex.'

      We appreciate your attention to these details and will ensure that the revised manuscript includes this clarification to improve accuracy and eliminate potential confusion. Thank you again for bringing this to our attention.

      (9) Line 536: The author should specify that these coordinates are for the experiment done on mice.

      Thank you for your valuable feedback. We will revise the manuscript to explicitly specify that these coordinates refer to the experiments conducted on mice. This clarification will help improve the clarity and precision of the manuscript. We greatly appreciate your attention to this point and your effort to enhance the quality of our work.

      Methods section:

      “and a hole was drilled in the skull according to the coordinates of the ventral division of the MGB (MGv, AP: -3.2 mm, ML: 2.1 mm, DV: 3.0 mm) for experiments conducted on mice.”

      (10) Line 590: Please add the specifications of the stimulating electrode. Is it unipolar or bipolar? What is the cat.# provided by FHC?

      Thank you for your valuable feedback. The electrodes used in the experiments are unipolar. We will include the catalog number provided by FHC in the revised manuscript for clarity. The revised text will be updated as follows:

      “In HFS-induced thalamocortical LTP experiments, two customized microelectrode arrays with four tungsten unipolar electrodes each, impedance: 0.5-1.0 MΩ (recording: CAT.# UEWSFGSECNND, FHC, U.S.), and 200-500 kΩ (stimulating: CAT.# UEWSDGSEBNND, FHC, U.S.), were used for the auditory cortical neuronal activity recording and MGB ES, respectively.”

      We appreciate your attention to this detail, and we will ensure that the revised manuscript reflects this clarification accurately.

      (11) Lines 612-614: There are no details of how the optic fiber was inserted or post-examined. If there is a word limitation, the authors may reference another study showing these procedures.

      Thank you for your insightful comment and for highlighting this important aspect of the methodology. To address this, we will reference the study by Sun et al. (2024) in the revised manuscript, which provides detailed procedures for optic fiber insertion and post-examination. We believe that this reference will help enhance the clarity and completeness of the methods section.

      Sun, W., Wu, H., Peng, Y., Zheng, X., Li, J., Zeng, D., Tang, P., Zhao, M., Feng, H., Li, H., et al. (2024). Heterosynaptic plasticity of the visuo-auditory projection requires cholecystokinin released from entorhinal cortex afferents. eLife 13, e83356. 10.7554/eLife.83356.

      We appreciate your valuable suggestion, which will contribute to improving the quality of the manuscript.

      Minor concerns:

      (1) The definition of HFS was repeated many times throughout the manuscript. Please mention the defined name for the first time in the manuscript only followed by its abbreviation (HFS).

      Thank you for your suggestion and for pointing out this important detail. We will revise the manuscript to ensure that all abbreviations are defined only upon their first mention in the manuscript, with subsequent mentions using the abbreviations consistently. We appreciate your careful attention to detail and your effort to help improve the manuscript.

      (2) Line 173: There is a difference between here and the methods section (620 nm here and 635 nm there) please correct which wavelength the authors used.

      Thank you for your careful review and for bringing this discrepancy to our attention. We have corrected the inconsistency, and the wavelength has been unified throughout the manuscript to ensure accuracy and clarity. The revised text now reads as follows:

      “The fluorescent signal was monitored for 25s before and 60s after the HFLS (5~10 mW, 620 nm) or HFS application.”

      We appreciate your valuable feedback, which has helped us improve the precision and consistency of the manuscript.

      (3) Line 185: I think the authors should refer to Figure 2G before mentioning the statistical results.

      Thank you for your careful review and for pointing out this oversight. We have now added a reference to Figure 2G at the appropriate location to ensure clarity and logical flow in the manuscript, as recommended..

      (4) Line 202: I think the authors should refer to Figure 2J before mentioning the statistical results.

      Thank you again for your careful review and for highlighting this point. We have revised the manuscript to include a reference to Figure 2J before mentioning the statistical results.

      We appreciate your valuable feedback, which has helped us improve the accuracy and presentation of the results.

      (5) Line 260: Please add appropriate references at the end of the sentence to support the argument.

      Thank you for your valuable suggestion. To address this, we have add appropriate references to support the statement regarding the multiple steps involved between mRNA expression and neuropeptide release. Additionally, we have revised the statement to adopt a more cautious interpretation. The revised text is as follows:

      “It is widely recognized that mRNA levels do not always directly correlate with peptide levels due to multiple steps involved in peptide synthesis and processing, including translation, post-translational modifications, packaging, transportation, and proteolytic cleavage, all of which require various enzymes and regulatory mechanisms (38-41). A disruption at any stage in this process could lead to impaired CCK release, even when Cck mRNA is present.”

      We have included the following references to support this statement:

      38. Mierke, C.T. (2020). Translation and Post-translational Modifications in Protein Biosynthesis. In Cellular Mechanics and Biophysics: Structure and Function of Basic Cellular Components Regulating Cell Mechanics, C.T. Mierke, ed. (Springer International Publishing), pp. 595-665. 10.1007/978-3-030-58532-7_14.

      39. Gualillo, O., Lago, F., Casanueva, F.F., and Dieguez, C. (2006). One ancestor, several peptides post-translational modifications of preproghrelin generate several peptides with antithetical effects. Mol Cell Endocrinol 256, 1-8. 10.1016/j.mce.2006.05.007.

      40. Sossin, W.S., Fisher, J.M., and Scheller, R.H. (1989). Cellular and molecular biology of neuropeptide processing and packaging. Neuron 2, 1407-1417. https://doi.org/10.1016/0896-6273(89)90186-4.

      41. Hook, V., Funkelstein, L., Lu, D., Bark, S., Wegrzyn, J., and Hwang, S.R. (2008). Proteases for processing proneuropeptides into peptide neurotransmitters and hormones. Annu Rev Pharmacol Toxicol 48, 393-423. 10.1146/annurev.pharmtox.48.113006.094812.

      We greatly appreciate your helpful feedback, which has allowed us to improve both the accuracy and the depth of discussion in the manuscript.

      (6) Line 278: The authors mentioned "due to the absence of CCK in aged animals", which was not an appropriate description. It should be a reduction of CCK gene expression or a possible deficient CCK release.

      Thank you for your careful review and for pointing out the inaccuracy in our description. We agree with your suggestion and have revised the statement to more appropriately reflect the findings.

      “Our findings revealed that thalamocortical LTP cannot be induced in aged mice, likely due to insufficient CCK release, despite intact CCKBR expression.”

      This revision ensures a more accurate and precise description of the potential mechanisms underlying the observed phenomenon. We greatly appreciate your valuable feedback, which has helped us improve the clarity and accuracy of the manuscript.

      (7) Line 291: The authors mentioned that "without MGB stimulation", which is confusing. The MGB was stimulated with a single electrical pulse to evoke cortical fEPSPs. Therefore it should be "without HFS of MGB".

      Thank you for pointing this out and for highlighting the potential confusion caused by our original phrasing. Upon review, we recognize that our original phrasing "without MGB stimulation" may have been unclear and could have led to misinterpretation. To clarify, our intention was to describe the period during which CCK was present without any stimulation of the MGB.

      It is important to note that, in the presence of CCK, LTP can be induced even with low-frequency stimulation, including in aged mice. This observation underscores the potent effect of CCK in facilitating thalamocortical LTP, regardless of the specific stimulation protocol used.

      To address this issue, we have revised the sentence for improved clarity as follows::

      " To investigate whether CCK alone is sufficient to induce thalamocortical LTP without activating thalamocortical projections, we infused CCK-4 into the ACx of young adult mice immediately after baseline fEPSPs recording. Stimulation was then paused for 15 min to allow for CCK degradation, after which recording resumed."

      We believe this revision resolves the misunderstanding and provides a clearer and more accurate description of the experimental context. We greatly appreciate your insightful feedback, which has helped us refine the manuscript for clarity and precision.

      Reviewer #3 (Recommendations for the authors):

      Minor comments:

      (1) Line 99, 134, possibly other locations: "site" to "sites".

      Thank you for your careful review. We appreciate your attention to detail and have made the necessary corrections in the manuscript.

      (2) Throughout the manuscript there are some minor issues with language choice and subtle phrasing errors and I suggest English language editing.

      Thank you for your suggestion. In response, we have thoroughly reviewed the manuscript and addressed issues related to language choice and phrasing. The text has been carefully edited to ensure clarity, precision, and consistency. We believe these revisions have significantly enhanced the overall quality of the manuscript. We greatly appreciate your feedback, which has been invaluable in improving the presentation of our work.

      (3) Based on the experimental configurations, I do not think it is a problematic caveat, but authors should be aware of the high likelihood of AAV9 jumping synapses relative to other AAV serotypes.

      Thank you for bringing up the potential of AAV9 crossing synapses, a recognized characteristic of this serotype. We appreciate your observation regarding its relevance to our experimental design. In our study, we carefully considered the possibility of trans-synaptic transfer during both the experimental design and data interpretation phases. To minimize the likelihood of significant trans-synaptic spread, we implemented several measures, including controlling the injection volume, using a slow injection rate, and limiting the viral expression time. Post-hoc histological analyses confirmed that the expression of AAV9 was largely confined to the intended regions, with limited evidence of synaptic jumping under our experimental conditions.

      While we acknowledge the inherent potential for AAV9 to cross synapses, we believe this effect does not substantially confound the interpretation of our findings in the current study. To address this concern, we have added a brief discussion on this point in the revised manuscript to enhance clarity. We greatly appreciate your insightful comment, which has helped us further refine our work.

      Discussion section:

      “ One potential limitation of our study is the trans-synaptic transfer property of AAV9. To mitigate this, we carefully controlled the injection volume, rate, and viral expression time, and conducted post-hoc histological analyses to minimize off-target effects, thereby reducing the likelihood of trans-synaptic transfer confounding the interpretation of our findings.”

      (4) The trace identifiers (1-4) do not seem correctly placed/colored in Figure S1D. Please check others carefully.

      Thank you for your careful review and for bringing this issue to our attention. We have corrected the trace identifiers in Figure S1D. Additionally, we have carefully reviewed all other figures to ensure their accuracy and consistency. We greatly appreciate your attention to detail, which has helped improve the overall quality of the manuscript.

      (5) Please provide a value of the laser power range based on calibrated values.

      Thank you for your suggestion. We have included the calibrated laser power range in the revised manuscript as follows:

      “The laser stimulation was produced by a laser generator (5-20 mW(30), Wavelength: 473 nm, 620 nm; CNI laser, China) controlled by an RX6 system and delivered to the brain via an optic fiber (Thorlabs, U.S.) connected to the generator.”

      We appreciate your feedback, which has helped improve the clarity and precision of our methodological description.

      (6) It would be useful to annotate figures in a way that identifies in which transgenic mice experiments are being performed.

      Thank you for your valuable suggestion. We will add annotations to the figures to explicitly identify the type of mice used in each experiment. We believe this enhancement will improve the clarity and accessibility of our results. We greatly appreciate your input in making our manuscript more informative.

      (7) Please comment on the rigor you use to address the accuracy of viral injections. How often did they spread outside of the MGB/AC?

      Thank you for raising this important question regarding the accuracy of viral injections and the potential spread outside the MGB or AC. Below, we provide details for each set of experiments:

      shRNA Experiments:

      For the shRNA experiments targeting the MGB, our primary goal was to achieve comprehensive coverage of the entire MGB. To this end, we used larger injection volumes and multiple injection sites, which inevitably resulted in some viral spread beyond the MGB. However, this approach was necessary to ensure robust knockdown effects that were representative of the entire MGB. While strict confinement to specific subregions could not be guaranteed, this strategy allowed us to prioritize the effectiveness of the knockdown within the target region.

      Fiber photometry Experiments:

      For the fiber photometry experiments targeting the auditory cortex (AC), we used larger injection volumes and multiple injection sites to cover its relatively large size. Although this approach might have resulted in some CCK-sensor virus spread outside the AC, the placement of the optic fiber was guided by the location of the auditory cortex. Consequently, any minor viral expression outside the AC would not affect the experimental results, as recordings were confined to the intended area through precise fiber placement.  

      Optogenetic Experiments:

      For the optogenetic experiments targeting the MGB, we specifically injected virus into the MGv subregion. To minimize viral spread, we employed several strategies, including the used fine injection needles, waiting for tissue stabilization (7 minutes post-needle insertion), delivering small volumes at a slow rate to prevent backflow, aspirating 5 nL of the solution post-injection, and raising the needle by 100 μm before waiting an additional 5 minutes prior to full retraction. These measures significantly reduced the risk of viral leakage to adjacent regions.

      Histological Validation:

      After the electrophysiological experiments, we systematically verified the accuracy of viral expression by examining histological sections to ensure that the expression was primarily localized within the intended regions.

      Terminology in the Manuscript:

      In the manuscript, we deliberately used the term "MGB" in the manuscript rather than specifically "MGv" to transparently acknowledge the potential for viral spread in some experiments.

      We hope this explanation clarifies the strategies we employed to address the accuracy of viral injections, as well as how we managed potential viral spread. We have also added a brief information in the revised manuscript to reflect these points and acknowledge the inherent variability in viral delivery.

    1. Author response:

      The following is the authors’ response to the original reviews

      We thank the reviewers for their constructive and helpful comments, which led us to make major changes in the model and manuscript, including adding the results of new experiments and analyses. We believe that the revised manuscript is much better than the previous version and that it addresses all issued raised by the reviewers. 

      Summary of changes made in the revised manuscript:

      (1) We increased the training set size from 39 video clips to 97 video clips and the testing set size from 25 video clips to 60 video clips. The increase in training set size improved the overall accuracy from a mean F1 score of 0.81 in the previous version to a mean F1 score of 0.891 (see Figure 2 and Figure 3) in the current version. Specifically, the F1 score for urine detection was improved from 0.79 to 0.88.

      (2) We further evaluated the accuracy of the DeePosit algorithm in comparison to a second human annotator and found that the algorithm accuracy is comparable to human-level accuracy.

      (3) The additional test videos allowed us to test the consistency of the algorithm performance across gender, space, time, and experiment type (SP, SxP, and ESPs). We found consistent levels of performance across all categories (see Figure 3), suggesting that errors made by the algorithm are uniform across conditions, hence should not create any bias of the results.

      (4) In addition, we tested the algorithm performance on a second strain of mice (male C57BL/6) in a different environmental condition (white arena instead of a black one) and found that the algorithm achieves comparable accuracy, even though C57BL/6 mice and white arena were not included in the training set. Thus, the algorithm seems to be robust and efficient across various experimental conditions.

      (5) Analyzing urination and defecation dynamics in an additional strain of mice revealed interesting strain-specific features, as discussed in the revised manuscript.

      (6) Overall, we found DeePosit accuracy to be stable with no significant bias across stages of the experiment, types of the experiment, gender of the mice, strain of mice, and across experimental conditions.

      (7) We also compared the performance of DeePosit to a classic object detection algorithm: YOLOv8. We trained YOLOv8 both on a single image input (YOLOv8 Gray) and on 3 image inputs representing a sequence of three time points around the ground truth event (t): t+0, t+10, and t+30 seconds (YOLOv8 RGB). DeePosit achieved significantly better accuracy over both YOLOv8 alternatives. YOLOv8 RGB achieved better accuracy than YOLOv8 Gray, suggesting that temporal information is important for this task. It's worth mentioning that while YOLOv8 requires the annotator to draw rectangles surrounding each urine spot or feces as part of the training set, our algorithm training set used just a single click inside each spot, allowing faster generation of training sets. 

      (8) As for the algorithm parameters, we tested the effect of the main parameter of the preliminary detection (the temperature threshold for the detection of a new blob) and found that a threshold of 1.6°C gave the best accuracy and used this parameter for all of the experiments instead of 1.1°C which was used in the original manuscript. It's worth mentioning that the performance is quite stable (mean F1 score of 0.88-0.89) for the thresholds between 1.1°C and 3°C (Figure 3—Figure Supplement 2).

      (9) We also checked if changing the input length of the video clip that is fed to the classifier affects the accuracy by training the classifier with -11..30 seconds video clips (41 seconds in total) instead of -11..60 seconds (71 seconds in total) and found no difference in accuracy. 

      (10) In the revised paper, we report recall, precision, and F1 scores in the caption of the relevant figures and also supply Excel files with the full statistics for each of the figures.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript provides a novel method for the automated detection of scent marks from urine and feces in rodents. Given the importance of scent communication in these animals and their role as model organisms, this is a welcome tool.

      We thank the reviewer for the positive assessment of our tool

      Strengths:

      The method uses a single video stream (thermal video) to allow for the distinction between urine and feces. It is automated.

      Weaknesses:

      The accuracy level shown is lower than may be practically useful for many studies. The accuracy of urine is 80%. 

      We have trained the model better, using a larger number of video clips. The increase in training set size improved the overall accuracy from a mean F1 score of 0.81 in the previous version to a mean F1 score of 0.891 (see Figure 2 and Figure 3) in the current version. Specifically, the F1 score for urine detection was improved from 0.79 to 0.88. 

      This is understandable given the variability of urine in its deposition, but makes it challenging to know if the data is accurate. If the same kinds of mistakes are maintained across many conditions it may be reasonable to use the software (i.e., if everyone is under/over counted to the same extent). Differences in deposition on the scale of 20% would be challenging to be confident in with the current method, though differences of the magnitude may be of biological interest. Understanding how well the data maintain the same relative ranking of individuals across various timing and spatial deposition metrics may help provide further evidence for the utility of the method.

      The additional test videos allowed us to test the consistency of the algorithm performance across gender, space, time and experiment type (SP, SxP, and ESP). We found consistent levels of performance across all categories (see Figure 3), suggesting that errors made by the algorithm are uniform across conditions, hence should not create any bias of the results.

      Reviewer #2 (Public Review):

      Summary:

      The authors built a tool to extract the timing and location of mouse urine and fecal deposits in their laboratory set up. They indicate that they are happy with the results they achieved in this effort.

      Yes, we are.

      The authors note urine is thought to be an important piece of an animal's behavioral repertoire and communication toolkit so methods that make studying these dynamics easier would be impactful.

      We thank the reviewer for the positive assessment of our work.

      Strengths:

      With the proposed method, the authors are able to detect 79% of the urine that is present and 84% of the feces that is present in a mostly automated way.

      Weaknesses:

      The method proposed has a large number of design choices across two detection steps that aren't investigated. I.e. do other design choices make the performance better, worse, or the same? 

      We chose to use a heuristic preliminary detection algorithm for the detection of warm blobs, since warm blobs can be robustly detected with heuristic algorithms without the need for a training set. This design selection might allow easier adaptation of our algorithm for different types of arenas. Another advantage of using a heuristic preliminary detection is the easy control of the preliminary detection parameters such as the minimum temperature difference for detecting a blob, size limits of the detected blob, cooldown rate and so on that may help in adopting it to new conditions. As for the classifier, we chose to feed it with a relatively small window surrounding each preliminary detection, and hence it is not affected by the arena’s appearance outside of its region of interest. This should allow lower sensitivity to the arena’s appearance.  

      As for the algorithm parameters, we tested the effect of the main parameter of the preliminary detection (the temperature threshold for the detection of a new blob) and found that a threshold of 1.6°C gave the best accuracy and used this parameter for all of the experiments instead of 1.1°C which was used in the original manuscript. It's worth mentioning that the performance is quite stable (mean F1 score of 0.88-0.89) for the thresholds between 1.1°C and 3°.

      We also checked if changing the input length of the video clip fed to the classifier affects the accuracy by training the classifier with -11..30 seconds video clips (41 seconds in total) instead of -11..60 seconds (71 seconds in total) and found no difference in accuracy. 

      Overall, the algorithm's accuracy seems to be rather stable across various choices of parameters.

      Are these choices robust across a range of laboratory environments?

      We tested the algorithm performance on a second strain of mice (male C57BL/6) in a different environmental condition (white arena instead of a black one) and found that the algorithm achieves comparable accuracy, even though C57BL/6 mice and white arena were not included in the training set. Thus, the algorithm seems to be robust and efficient across various experimental conditions.

      How much better are the demonstrated results compared to a simple object detection pipeline (i.e. FasterRCNN or YOLO on the raw heat images)?

      We compared the performance of DeePosit to a classic object detection algorithm: YOLOv8. We trained YOLOv8 both on a single image input (YOLOv8 Gray) and on 3 image inputs representing a sequence of three time points around the ground truth event (t): t+0, t+10, and t+30 seconds (YOLOv8 RGB). DeePosit achieved significantly better accuracy over both YOLOv8 alternatives. YOLOv8 RGB achieved better accuracy than YOLOv8 Gray, suggesting that temporal information is important for this task. It's worth mentioning that while YOLOv8 requires annotator to draw rectangles surrounding each urine spot or feces as part of the training set, our algorithm training set used just a single click inside each spot, allowing faster generation of a training sets. 

      The method is implemented with a mix of MATLAB and Python.

      That is right.

      One proposed reason why this method is better than a human annotator is that it "is not biased." While they may mean it isn't influenced by what the researcher wants to see, the model they present is still statistically biased since each object class has a different recall score. This wasn't investigated. In general, there was little discussion of the quality of the model. 

      We tested the consistency of the algorithm performance across gender, space, time and experiment type (SP, SxP, and ESP). We found consistent levels of performance across all categories (see Figure 3), suggesting that errors made by the algorithm are uniform across conditions, hence should ne create any bias of the results. Specifically, the detection accuracy is similar between urine and feces, hence should not impose a bias between the various object classes.

      Precision scores were not reported.

      In the revised paper we report recall, precision, and F1 scores in the caption of the relevant figures and also supply Excel files with the full statistics for each of the figures.

      Is a recall value of 78.6% good for the types of studies they and others want to carry out? What are the implications of using the resulting data in a study?

      We have trained the model better, using a larger number of video clips. The increase in training set size improved the overall accuracy from a mean F1 score of 0.81 in the previous version to a mean F1 score of 0.891 (see Figure 2 and Figure 3) in the current version. Specifically, the F1 score for urine detection was improved from 0.79 to 0.88. 

      How do these results compare to the data that would be generated by a "biased human?"

      We further evaluated the accuracy of the DeePosit algorithm in comparison to a second human annotator and found that the algorithm accuracy is comparable to human-level accuracy (Figure 3).

      5 out of the 6 figures in the paper relate not to the method but to results from a study whose data was generated from the method. This makes a paper, which, based on the title, is about the method, much longer and more complicated than if it focused on the method.

      We appreciate the reviewer's comment, but the analysis of this new dataset by DeePosit demonstrates how the algorithm may be used to reveal novel and distinguishable dynamics of urination and defecation activities during social interactions, which were not yet reported. 

      Also, even in the context of the experiments, there is no discussion of the implications of analyzing data that was generated from a method with precision and recall values of only 7080%. Surely this noise has an effect on how to correctly calculate p-values etc. Instead, the authors seem to proceed like the generated data is simply correct.

      As mentioned above, the increase in training set size improved the overall accuracy from a mean F1 score of 0.81 in the previous version to a mean F1 score of 0.891 (see Figure 2 and Figure 3) in the current version. Specifically, the F1 score for urine detection was improved from 0.79 to 0.88.  

      Reviewer #3 (Public Review):

      Summary:

      The authors introduce a tool that employs thermal cameras to automatically detect urine and feces deposits in rodents. The detection process involves a heuristic to identify potential thermal regions of interest, followed by a transformer network-based classifier to differentiate between urine, feces, and background noise. The tool's effectiveness is demonstrated through experiments analyzing social preference, stress response, and temporal dynamics of deposits, revealing differences between male and female mice.

      Strengths:

      The method effectively automates the identification of deposits

      The application of the tool in various behavioral tests demonstrates its robustness and versatility.

      The results highlight notable differences in behavior between male and female mice

      We thank the reviewer for the positive assessment of our work.

      Weaknesses:

      The definition of 'start' and 'end' periods for statistical analysis is arbitrary. A robustness check with varying time windows would strengthen the conclusions.

      In all the statistical tests conducted in the revised manuscript, we have used a time period of 4 minutes for the analysis. We did not used the last minute of each stage for the analysis since the input of DeePosit requires 1 minute of video after the event. Nevertheless, we also conducted the same tests using a 5-minute period and found similar results (Figure 5—Figure Supplement 1).

      The paper could better address the generalizability of the tool to different experimental setups, environments, and potentially other species.

      As mentioned above, we tested the algorithm performance on a second strain of mice (male C57BL/6) in a different environmental condition (white arena instead of a black one) and found that the algorithm achieves comparable accuracy, even though C57BL/6 mice and white arena were not included in the training set. Thus, the algorithm seems to be robust and efficient across various experimental conditions.

      The results are based on tests of individual animals, and there is no discussion of how this method could be generalized to experiments tracking multiple animals simultaneously in the same arena (e.g., pair or collective behavior tests, where multiple animals may deposit urine or feces).

      At the moment, the algorithm cannot be applied for multiple animals freely moving in the same arena. However, in the revised manuscript we explicitly discussed what is needed for adapting the algorithm to perform such analyses.

      Recommendations for the authors: 

      -  Add a note and/or perform additional calculations to show that the results do not depend on the specific definitions of 'start' and 'end' periods. For instance, vary the time window thresholds and recalculate the statistics using different windows (e.g., 1-5 minutes instead of 1-4 minutes).

      In all the statistical tests conducted in the revised manuscript, we have used a time period of 4 minutes for the analysis. We did not use the last minute of each stage for the analysis since the input of DeePosit requires 1 minute of video after the event. Nevertheless, we also conducted the same tests using a 5-minute period and found similar results (Figure 5—Figure Supplement 1).

      - Condense Figures 4, 5, and 6 to simplify the presentation. Focus on demonstrating the effectiveness of the tool rather than detailed experimental outcomes, as the primary contribution of this paper is methodological.

      We have added to the revised manuscript one technical figure (Figure 3) comparing the accuracy of the algorithm performance across gender, space, time, and experiment type (SP, SxP, and ESP) as well as comparing its performance to a second human annotator and to YOLOv8. One more partially technical figure (Figure 5) compares the results of the algorithm between white ICR mice in the black arena and black C57BL/6 mice in the white arena. Thus, only Figures 4 and 6 show detailed experimental outcomes.

      - Provide more detail on how the preliminary detection procedure and parameters might need adjustment for different experimental setups or conditions. Discuss potential adaptations for field settings or more complex environments.

      As for the algorithm parameters, we tested the effect of the main parameter of the preliminary detection (the temperature threshold for the detection of a new blob) and found that a threshold of 1.6°C gave the best accuracy and used this parameter for all of the experiments instead of 1.1°C which was used in the original manuscript. It's worth mentioning that the performance is quite stable (mean F1 score of 0.88-0.89) for the thresholds between 1.1°C and 3°.

      We also checked if changing the input length of the video clip that is fed to the classifier affects the accuracy by training the classifier with -11..30 seconds video clips (41 seconds in total) instead of -11..60 seconds (71 seconds in total) and found no difference in accuracy. 

      Overall, the algorithm's accuracy seems to be rather stable across various choices of parameters.

      Editor's note:

      Should you choose to revise your manuscript, please ensure your manuscript includes full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript.

      We have deposited the detailed statistics of each figure in https://github.com/davidpl2/DeePosit/tree/main/FigStat/PostRevision

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This valuable study investigates how hearing impairment affects neural encoding of speech, in particular the encoding of hierarchical linguistic information. The current analysis provides incomplete evidence that hearing impairment affects speech processing at multiple levels, since the novel analysis based on HM-LSTM needs further justification. The advantage of this method should also be further explained. The study can also benefit from building a stronger link between neural and behavioral data.

      We sincerely thank the editors and reviewers for their detailed and constructive feedback.

      We have revised the manuscript to address all of the reviewers’ comments and suggestions. The primary strength of our methods lies in the use of the HM-LSTM model, which simultaneously captures linguistic information at multiple levels, ranging from phonemes to sentences. As such, this model can be applied to other questions regarding hierarchical linguistic processing. We acknowledge that our current behavioral results from the intelligibility test may not fully differentiate between the perception of lower-level acoustic/phonetic information and higher-level meaning comprehension. However, it remains unclear what type of behavioral test would effectively address this distinction. We aim to xplore this connection further in future studies.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors are attempting to use the internal workings of a language hierarchy model, comprising phonemes, syllables, words, phrases, and sentences, as regressors to predict EEG recorded during listening to speech. They also use standard acoustic features as regressors, such as the overall envelope and the envelopes in log-spaced frequency bands. This is valuable and timely research, including the attempt to show differences between normal-hearing and hearing-impaired people in these regards. I will start with a couple of broader questions/points, and then focus my comments on three aspects of this study: The HM-LSTM language model and its usage, the time windows of relevant EEG analysis, and the usage of ridge regression.

      Firstly, as far as I can tell, the OSF repository of code, data, and stimuli is not accessible without requesting access. This needs to be changed so that reviewers and anybody who wants or needs to can access these materials. 

      It is my understanding that keeping the repository private during the review process and making them public after acceptance is standard practice. As far as I understand, although the OSF repository was private, anyone with the link should be able to access it. I have now made the repository public.

      What is the quantification of model fit? Does it mean that you generate predicted EEG time series from deconvolved TRFs, and then give the R2 coefficient of determination between the actual EEG and predicted EEG constructed from the convolution of TRFs and regressors? Whether or not this is exactly right, it should be made more explicit.

      Model fit was measured by spatiotemporal cluster permutation tests (Maris & Oostenveld, 2007) on the contrasts of the timecourses of the z-transformed coefficient of determination (R<sup>2</sup>). For instance, to assess whether words from the attended stimuli better predict EEG signals during the mixed speech compared to words from the unattended stimuli, we used the 150dimensional vectors corresponding to the word layer from our LSTM model for the attended and unattended stimuli as regressors. We then fit these regressors to the EEG signals at 9 time points (spanning -100 ms to 300 ms around the sentence offsets, with 50 ms intervals). We then conducted one-tailed two-sample t-tests to determine whether the differences in the contrasts of the R<sup>2</sup> timecourses were statistically significant. Note that we did not perform TRF analyses. We have clarified this description in the “Spatiotemporal clustering analysis” section of the “Methods and Materials” on p.10 of the manuscript.

      About the HM-LSTM:

      • In the Methods paragraph about the HM-LSTM, a lot more detail is necessary to understand how you are using this model. Firstly, what do you mean that you "extended" it, and what was that procedure? 

      The original HM-LSTM model developed by Chung et al. (2017) consists of only two levels: the word level and the phrase level (Figure 1b from their paper). By “extending” the model, we mean that we expanded its architecture to include five levels: phoneme, syllable, word, phrase, and sentence. Since our input consists of phoneme embeddings, we cannot directly apply their model, so we trained our model on the WenetSpeech corpus (Zhang et al., 2021), which provides phoneme-level transcripts. We have added this clarification on p.4 of the manuscript.

      • And generally, this is the model that produces most of the "features", or regressors, whichever word we like, for the TRF deconvolution and EEG prediction, correct? 

      Yes, we extracted the 2048-dimensional hidden layer activity from the model to represent features for each sentence in our speech stimuli at the phoneme, syllable, word, phrase and sentence levels. But we did not perform any TRF deconvolution, we fit these features (downsampled to 150-dimension using PCA) to the EEG signals at 9 timepoints around the offset of each sentence using ridge regression. We have now added a multivariate TRF (mTRF) analysis following Reviewer 3’s suggestions, and the results showed similar patterns to the current results (see Figure S2). We have added the clarification in the “Ridge regression at different time latencies” section of the “Methods and Materials” on p.10 of the manuscript.

      Resutls from the mTRF analyses were added on p.7 of the manuscript.

      • A lot more detail is necessary then, about what form these regressors take, and some example plots of the regressors alongside the sentences.

      The linguistic regressors are just 5 150-dimensional vectors, each corresponding to one linguistic level, as shown in Figure 1B.

      • Generally, it is necessary to know what these regressors look like compared to other similar language-related TRF and EEG/MEG prediction studies. Usually, in the case of e.g. Lalor lab papers or Simon lab papers, these regressors take the form of single-sample event markers, surrounded by zeros elsewhere. For example, a phoneme regressor might have a sample up at the onset of each phoneme, and a word onset regressor might have a sample up at the onset of each word, with zeros elsewhere in the regressor. A phoneme surprisal regressor might have a sample up at each phoneme onset, with the value of that sample corresponding to the rarity of that phoneme in common speech. Etc. Are these regressors like that? Or do they code for these 5 linguistic levels in some other way? Either way, much more description and plotting is necessary in order to compare the results here to others in the literature.

      No, these regressors were not like that. They were 150-dimensional vectors (after PCA dimension reduction) extracted from the hidden layers of the HM-LSTM model. After training the model on the WenetSpeech corpus, we ran it on our speech stimuli and extracted representations from the five hidden layers to correspond to the five linguistic levels. As mentioned earlier, we did not perform TRF analyses; instead, we used ridge regression to predict EEG signals around the offset of each sentence, a method commonly employed in the literature (e.g., Caucheteux & King, 2022; Goldstein et al., 2022; Schmitt et al., 2021; Schrimpf et al., 2021). For instance, Goldstein et al. (2022) used word embeddings from GPT-2 to predict ECoG activity surrounding the onset of each word during naturalistic listening. We have included these literatures on p.3 in the manuscript, and the method is illustrated in Figure 1B.

      • You say that the 5 regressors that are taken from the trained model's hidden layers do not have much correlation with each other. However, the highest correlations are between syllable and sentence (0.22), and syllable and word (0.17). It is necessary to give some reason and interpretation of these numbers. One would think the highest correlation might be between syllable and phoneme, but this one is almost zero. Why would the syllable and sentence regressors have such a relatively high correlation with each other, and what form do those regressors take such that this is the case?

      All the regressors are represented as 2048-dimensional vectors derived from the hidden layers of the trained HM-LSTM model. We applied the trained model to all 284 sentences in our stimulus text, generating a set of 284 × 2048-dimensional vectors. Next, we performed Principal Component Analysis (PCA) on the 2048 dimensions and extracted the first 100 principal components (PCs), resulting in 284 × 100-dimensional vectors for each regressor. These 284 × 100 matrices were then flattened into 28,400-dimensional vectors. Subsequently, we computed the correlation matrix for the z-transformed 28,400-dimensional vectors of our five linguistic regressors. The code for this analysis, lstm_corr.py, can be found in our OSF repository. We have added a section “Correlation among linguistic features” in “Materials and Methods” on p.10 of the manuscript.

      We consider the observed coefficients of 0.17 and 0.22 to be relatively low compared to prior model-brain alignment studies which report correlation coefficients above 0.5 for linguistic regressors (e.g., Gao et al., 2024; Sugimoto et al., 2024). In Chinese, a single syllable can also function as a word, potentially leading to higher correlations between regressors for syllables and words. However, we refrained from overinterpreting the results to suggest a higher correlation between syllable and sentence compared to syllable and word. A paired ttest of the syllable-word coefficients versus syllable-sentence coefficients across the 284 sentences revealed no significant difference (t(28399)=-3.96, p=1). We have incorporated this information into p.5 of the manuscript.

      • If these regressors are something like the time series of zeros along with single sample event markers as described above, with the event marker samples indicating the onset of the relevant thing, then one would think e.g. the syllable regressor would be a subset of the phoneme regressor because the onset of every syllable is a phoneme. And the onset of every word is a syllable, etc.

      All the regressors are aligned to 9 time points surrounding sentence offsets (-100 ms to 300 ms with a 50 ms interval). This is because all our regressors are taken from the HM-LSTM model, where the input is the phoneme representation of a sentence (e.g., “zh ə_4 y ie_3 j iəu_4 x iaŋ_4 sh uei_3 y ii_2 y aŋ_4”). For each unit in the sentence, the model generates five 2048dimensional vectors, each corresponding to the five linguistic levels of the entire sentence. We have added the clarification on p.11 of the manuscript.

      For the time windows of analysis:

      • I am very confused, because sometimes the times are relative to "sentence onset", which would mean the beginning of sentences, and sometimes they are relative to "sentence offset", which would mean the end of sentences. It seems to vary which is mentioned. Did you use sentence onsets, offsets, or both, and what is the motivation?

      • If you used onsets, then the results at negative times would not seem to mean anything, because that would be during silence unless the stimulus sentences were all back to back with no gaps, which would also make that difficult to interpret.

      • If you used offsets, then the results at positive times would not seem to mean anything, because that would be during silence after the sentence is done. Unless you want to interpret those as important brain activity after the stimuli are done, in which case a detailed discussion of this is warranted.

      Thank you very much for pointing this out. All instances of “sentence onset” were typos and should be corrected to “sentence offset.” We chose offset because the regressors are derived from the hidden layer activity of our HM-LSTM model, which processes the entire sentence before generating outputs. We have now corrected all the typos. In continuous speech, there is no distinct silence period following sentence offsets. Additionally, lexical or phrasal processing typically occurs 200 ms after stimulus offsets (Bemis & Pylkkanen, 2011; Goldstein et al., 2022; Li et al., 2024; Li & Pylkkänen, 2021). Therefore, we included a 300 ms interval after sentence offsets in our analysis, as our regressors encompass linguistic levels up to the sentence level. We have added this motivation on p.11 of the manuscript.

      • For the plots in the figures where the time windows and their regression outcomes are shown, it needs to be explicitly stated every time whether those time windows are relative to sentence onset, offset, or something else.

      Completely agree and thank you very much for the suggestion. We have now added this information on Figure 4-6.

      • Whether the running correlations are relative to sentence onset or offset, the fact that you can have numbers outside of the time of the sentence (negative times for onset, or positive times for offset) is highly confusing. Why would the regressors have values outside of the sentence, meaning before or after the sentence/utterance? In order to get the running correlations, you presumably had the regressor convolved with the TRF/impulse response to get the predicted EEG first. In order to get running correlation values outside the sentence to correlate with the EEG, you would have to have regressor values at those time points, correct? How does this work?

      As mentioned earlier, we did not perform TRF analyses or convolve the regressors. Instead, we conducted regression analyses at each of the 9 time points surrounding the sentence offsets, following standard methods commonly used in model-brain alignment studies (e.g., Gao et al., 2024; Goldstein et al., 2022). The time window of -100 to 300 ms was selected based on prior findings that lexical and phrasal processing typically occurs 200–300 ms after word offsets (Bemis & Pylkkanen, 2011; Goldstein et al., 2022; Li et al., 2024; Li & Pylkkänen, 2021). Additionally, we included the -100 to 200 ms time period in our analysis to examine phoneme and syllable level processing (cf. Gwilliams et al., 2022). We have added the clarification on p. of the manuscript.

      • In general, it seems arbitrary to choose sentence onset or offset, especially if the comparison is the correlation between predicted and actual EEG over the course of a sentence, with each regressor. What is going on with these correlations during the middle of the sentences, for example? In ridge regression TRF techniques for EEG/MEG, the relevant measure is often the overall correlation between the predicted and actual, calculated over a longer period of time, maybe the entire experiment. Here, you have calculated a running comparison between predicted and actual, and thus the time windows you choose to actually analyze can seem highly cherry-picked, because this means that most of the data is not actually analyzed.

      The rationale for choosing sentence offsets instead of onsets is that we are aligning the HM-LSTM model’s activity with EEG responses, and the input to the model consists of phoneme representations of the entire sentence at one time. In other words, the model needs to process the whole sentence before generating representations at each linguistic level. Therefore, the corresponding EEG responses should also align with the sentence offsets, occurring after participants have seen the complete sentence. The ridge regression followed the common practice in model-brain alignment studies (e.g., Gao et al., 2024; Goldstein et al., 2022; Huth et al., 2016; Schmitt et al., 2021; Schrimpf et al., 2021), and the time window is not cherrypicked but based on prior literature reporting lexical and sublexical processing at these time period (e.g., Bemis & Pylkkanen, 2011; Goldstein et al., 2022; Gwilliams et al., 2022; Li et al., 2024; Li & Pylkkänen, 2021).

      • In figures 5 and 6, some of the time window portions that are highlighted as significant between the two lines have the lines intersecting. This looks like, even though you have found that the two lines are significantly different during that period of time, the difference between those lines is not of a constant sign, even during that short period. For instance, in figure 5, for the syllable feature, the period of 0 - 200 ms is significantly different between the two populations, correct? But between 0 and 50, normal-hearing are higher, between 50 and 150, hearing-impaired are higher, and between 150 and 200, normal-hearing are higher again, correct? But somehow they still end up significantly different overall between 0 and 200 ms. More explanation of occurrences like these is needed.

      The intersecting lines in Figures 5 and represent the significant time windows for withingroup comparisons (i.e., significant model fit compared to 0). They do not depict betweengroup comparisons, as no significant contrasts were found between the groups. For example, in Figure 1, the significant time windows for the acoustic models are shown separately for the hearing-impaired and normal-hearing groups. No significant differences were observed, as indicated by the sensor topography. We have now clarified this point in the captions for Figures 5 and 6.

      Using ridge regression:

      • What software package(s) and procedure(s) were specifically done to accomplish this? If this is ridge regression and not just ordinary least squares, then there was at least one non-zero regularization parameter in the process. What was it, how did it figure in the modeling and analysis, etc.?

      The ridge regression was performed using customary python codes, making heavy use of the sklearn (v1.12.0) package. We used ridge regression instead of ordinary least squares regression because all our linguistic regressors are 150-dimensional dense vectors, and our acoustic regressors are 130-dimension vectors (see “Acoustic features of the speech stimuli” in “Materials and Methods”). We kept the default regularization parameter (i.e., 1). This ridge regression methods is commonly used in model-brain alignment studies, where the regressors are high-dimensional vectors taken from language models (e.g., Gao et al., 2024; Goldstein et al., 2022; Huth et al., 2016; Schmitt et al., 2021; Schrimpf et al., 2021). The code ridge_lstm.py can be found in our OSF repository, and we have added the more detailed description on p.11 of the manuscript.

      • It sounds like the regressors are the hidden layer activations, which you reduced from 2,048 to 150 non-acoustic, or linguistic, regressors, per linguistic level, correct? So you have 150 regressors, for each of 5 linguistic levels. These regressors collectively contribute to the deconvolution and EEG prediction from the resulting TRFs, correct? This sounds like a lot of overfitting. How much correlation is there from one of these 150 regressors to the next? Elsewhere, it sounds like you end up with only one regressor for each of the 5 linguistic levels. So these aspects need to be clarified.

      • For these regressors, you are comparing the "regression outcomes" for different conditions; "regression outcomes" are the R2 between predicted and actual EEG, which is the coefficient of determination, correct? If this is R2, how is it that you have some negative numbers in some of the plots? R2 should be only positive, between 0 and 1.

      Yes we reduced 2048-dimensional vectors for each of the 5 linguistic levels to 150 using PCA, mainly for saving computational resources. We used ridge regression, following the standard practice in the field (e.g., Gao et al., 2024; Goldstein et al., 2022; Huth et al., 2016; Schmitt et al., 2021; Schrimpf et al., 2021). 

      Yes, the regression outcomes are the R<sup>2</sup> values representing the fit between the predicted and actual EEG data. However, we reported normalized R<sup>2</sup> values which are ztransformed in the plots. All our spatiotemporal cluster permutation analyses were conducted using the z-transformed R<sup>2</sup> values. We have added this clarification both in the figure captions and on p.11 of the manuscript. As a side note, R<sup>2</sup> values can be negative because they are not the square of a correlation coefficient. Rather, R<sup>2</sup> compares the fit of the chosen model to that of a horizontal straight line (the null hypothesis). If the chosen model fits the data worse than the horizontal line, then R<sup>2</sup> value becomes negative: https://www.graphpad.com/support/faq/how-can-rsup2sup-be-negative 

      Reviewer #2 (Public Review):

      This study compares neural responses to speech in normal-hearing and hearing-impaired listeners, investigating how different levels of the linguistic hierarchy are impacted across the two cohorts, both in a single-talker and multi-talker listening scenario. It finds that, while normal-hearing listeners have a comparable cortical encoding of speech-in-quiet and attended speech from a multi-talker mixture, participants with hearing impairment instead show a reduced cortical encoding of speech when it is presented in a competing listening scenario. When looking across the different levels of the speech processing hierarchy in the multi-talker condition, normal-hearing participants show a greater cortical encoding of the attended compared to the unattended stream in all speech processing layers - from acoustics to sentencelevel information. Hearing-impaired listeners, on the other hand, only have increased cortical responses to the attended stream for the word and phrase levels, while all other levels do not differ between attended and unattended streams.

      The methods for modelling the hierarchy of speech features (HM-LSTM) and the relationship between brain responses and specific speech features (ridge-regression) are appropriate for the research question, with some caveats on the experimental procedure. This work offers an interesting insight into the neural encoding of multi-talker speech in listeners with hearing impairment, and it represents a useful contribution towards understanding speech perception in cocktail-party scenarios across different hearing abilities. While the conclusions are overall supported by the data, there are limitations and certain aspects that require further clarification.

      (1) In the multi-talker section of the experiment, participants were instructed to selectively attend to the male or the female talker, and to rate the intelligibility, but they did not have to perform any behavioural task (e.g., comprehension questions, word detection or repetition), which could have demonstrated at least an attempt to comply with the task instructions. As such, it is difficult to determine whether the lack of increased cortical encoding of Attended vs. Unattended speech across many speech features in hearing-impaired listeners is due to a different attentional strategy, which might be more oriented at "getting the gist" of the story (as the increased tracking of only word and phrase levels might suggest), or instead it is due to hearing-impaired listeners completely disengaging from the task and tuning back in for selected key-words or word combinations. Especially the lack of Attended vs. Unattended cortical benefit at the level of acoustics is puzzling and might indicate difficulties in performing the task. I think this caveat is important and should be highlighted in the Discussion section. RE: Thank you very much for the suggestion. We admit that the hearing-impaired listeners might adopt different attentional strategies or potentially disengage from the task due to comprehension difficulties. However, we would like to emphasize that our hearing-impaired participants have extended high-frequency (EHF) hearing loss, with impairment only at frequencies above 8 kHz. Their condition is likely not severe enough to cause them to adopt a markedly different attentional strategy for this task. Moreover, it is possible that our normalhearing listeners may also adopt varying attentional strategies, yet the comparison still revealed notable differences.We have added the caveat in the Discussion section on p.8 of the manuscript.

      (2) In the EEG recording and preprocessing section, you state that the EEG was filtered between 0.1Hz and 45Hz. Why did you choose this very broadband frequency range? In the literature, speech responses are robustly identified between 0.5Hz/1Hz and 8Hz. Would these results emerge using a narrower and lower frequency band? Considering the goal of your study, it might also be interesting to run your analysis pipeline on conventional frequency bands, such as Delta and Theta, since you are looking into the processing of information at different temporal scales.

      Indeed, we have decomposed the epoched EEG time series for each section into six classic frequency bands components (delta 1–3 Hz, theta 4–7 Hz, alpha 8–12 Hz, beta 12–20 Hz, gamma 30–45 Hz) by convolving the data with complex Morlet wavelets as implemented in MNE-Python (version 0.24.0). The number of cycles in the Morlet wavelets was set to frequency/4 for each frequency bin. The power values for each time point and frequency bin were obtained by taking the square root of the resulting time-frequency coefficients. These power values were normalized to reflect relative changes (expressed in dB) with respect to the 500 ms pre-stimulus baseline. This yielded a power value for each time point and frequency bin for each section. We specifically examined the delta and theta bands, and computed the correlation between the regression outcome (R<sup>2</sup> in the shape of number of subject * sensor * time were flattened for computing correlation) for the five linguistic predictors from these bands and those obtained using data from all frequency bands. The results showed high correlation coefficients (see the correlation matrix in Supplementary Figures S2 for the attended and unattended speech). Therefore, we opted to use the epoched EEG data from all frequency bands for our analyses. We have added this clarification in the Results section on p.5 and the “EEG recording and preprocessing” section in “Materials and Methods” on p.11 of the manuscript.

      (3) A paragraph with more information on the HM-LSTM would be useful to understand the model used without relying on the Chung et al. (2017) paper. In particular, I think the updating mechanism of the model should be clarified. It would also be interesting to modify the updating factor of the model, along the lines of Schmitt et al. (2021), to assess whether a HM-LSTM with faster or slower updates can better describe the neural activity of hearing-impaired listeners. That is, perhaps the difference between hearing-impaired and normal-hearing participants lies in the temporal dynamics, and not necessarily in a completely different attentional strategy (or disengagement from the stimuli, as I mentioned above).

      Thank you for the suggestion. We have added more details on our HM-LSTM model on p.10 “Hierarchical multiscale LSTM model” in “Materials and Methods”: Our HM-LSTM model consists of 4 layers, at each layer, the model implements a COPY or UPDATE operation at each time step t. The COPY operation maintains the current cell state of without any changes until it receives a summarized input from the lower layer. The UPDATE operation occurs when a linguistic boundary is detected in the layer below, but no boundary was detected at the previous time step t-1. In this case, the cell updates its summary representation, similar to standard RNNs. We agree that exploring modifications to the model’s updating factor would be an interesting direction. However, since we have already observed contrasts between normal-hearing and hearing-impaired listeners using the current model’s update parameters, we believe discussing additional hypotheses would overextend the scope of this paper.

      (4) When explaining how you extracted phoneme information, you mention that "the inputs to the model were the vector representations of the phonemes". It is not clear to me whether you extracted specific phonetic features (e.g., "p" sound vs. "b" sound), or simply the phoneme onsets. Could you clarify this point in the text, please?

      The model inputs were individual phonemes from two sentences, each transformed into a 1024-dimensional vector using a simple lookup table. This lookup table stores embeddings for a fixed dictionary of all unique phonemes in Chinese. This approach is a foundational technique in many advanced NLP models, enabling the representation of discrete input symbols in a continuous vector space. We have added this clarification on p.10 of the manuscript.

      Reviewer #3 (Public Review):

      Summary:

      The authors aimed to investigate how the brain processes different linguistic units (from phonemes to sentences) in challenging listening conditions, such as multi-talker environments, and how this processing differs between individuals with normal hearing and those with hearing impairments. Using a hierarchical language model and EEG data, they sought to understand the neural underpinnings of speech comprehension at various temporal scales and identify specific challenges that hearing-impaired listeners face in noisy settings.

      Strengths:

      Overall, the combination of computational modeling, detailed EEG analysis, and comprehensive experimental design thoroughly investigates the neural mechanisms underlying speech comprehension in complex auditory environments.

      The use of a hierarchical language model (HM-LSTM) offers a data-driven approach to dissect and analyze linguistic information at multiple temporal scales (phoneme, syllable, word, phrase, and sentence). This model allows for a comprehensive neural encoding examination of how different levels of linguistic processing are represented in the brain.

      The study includes both single-talker and multi-talker conditions, as well as participants with normal hearing and those with hearing impairments. This design provides a robust framework for comparing neural processing across different listening scenarios and groups.

      Weaknesses:

      The analyses heavily rely on one specific computational model, which limits the robustness of the findings. The use of a single DNN-based hierarchical model to represent linguistic information, while innovative, may not capture the full range of neural coding present in different populations. A low-accuracy regression model-fit does not necessarily indicate the absence of neural coding for a specific type of information. The DNN model represents information in a manner constrained by its architecture and training objectives, which might fit one population better than another without proving the non-existence of such information in the other group. To address this limitation, the authors should consider evaluating alternative models and methods. For example, directly using spectrograms, discrete phoneme/syllable/word coding as features, and performing feature-based temporal response function (TRF) analysis could serve as valuable baseline models. This approach would provide a more comprehensive evaluation of the neural encoding of linguistic information.

      Our acoustic features are indeed direct the broadband envelopes and the log-mel spectrograms of the speech streams. The amplitude envelope of the speech signal was extracted using the Hilbert transform. The 129-dimension spectrogram and 1-dimension envelope were concatenated to form a 130-dimension acoustic feature at every 10 ms of the speech stimuli. Given the duration of our EEG recordings, which span over 10 minutes, conducting multivariate TRF (mTRF) analysis with such high-dimensional predictors was not feasible. Instead, we used ridge regression to predict EEG responses across 9 temporal latencies, ranging from -100 ms to +300 ms, with additional 50 ms latencies surrounding sentence offsets. To evaluate the model's performance, we extracted the R<sup>2</sup> values at each latency, providing a temporal profile of regression performance over the analyzed time period. This approach is conceptually similar to TRF analysis.

      We agree that including baseline models for the linguistic features is important, and we have now added results from mTRF analysis using phoneme, syllable, word, phrase, and sentence rates as discrete predictors (i.e., marking a value of 1 at each unit boundary offset). Our EEG data spans the entire 10-minute duration for each condition, sampled at 10-ms intervals. The TRF results for our main comparison—attended versus unattended conditions— showed similar patterns to those observed using features from our HM-LSTM model. At the phoneme and syllable levels, normal-hearing listeners showed marginally significantly higher TRF weights for attended speech compared to unattended speech at approximately -80 to 150 ms after phoneme offsets (t=2.75, Cohen’s d=0.87, p=0.057), and 120 to 210 ms after syllable offsets (t=3.96, Cohen’s d=0.73d = 0.73, p=0.083). At the word and phrase levels, normalhearing listeners exhibited significantly higher TRF weights for attended speech compared to unattended speech at 190 to 290 ms after word offsets (t=4, Cohen’s d=1.13, p=0.049), and around 120 to 290 ms after phrase offsets (t=5.27, Cohen’s d=1.09, p=0.045). For hearing-impaired listeners, marginally significant effects were observed at 190 to 290 ms after word offsets (t=1.54, Cohen’s d=0.6, p=0.059), and 180 to 290 ms after phrase offsets (t=3.63, Cohen’s d=0.89, p=0.09). These results have been added on p.7 of the manuscript, and the corresponding figure is included as Supplementary F2.

      It is not entirely clear if the DNN model used in this study effectively serves the authors' goal of capturing different linguistic information at various layers. Specifically, the results presented in Figure 3C are somewhat confusing. While the phonemes are labeled, the syllables, words, phrases, and sentences are not, making it difficult to interpret how the model distinguishes between these levels of linguistic information. The claim that "Hidden-layer activity for samevowel sentences exhibited much more similar distributions at the phoneme and syllable levels compared to those at the word, phrase and sentence levels" is not convincingly supported by the provided visualizations. To strengthen their argument, the authors should use more quantified metrics to demonstrate that the model indeed captures phrase, word, syllable, and phoneme information at different layers. This is a crucial prerequisite for the subsequent analyses and claims about the hierarchical processing of linguistic information in the brain.

      Quantitative measures such as mutual information, clustering metrics, or decoding accuracy for each linguistic level could provide clearer evidence of the model's effectiveness in this regard.

      In Figure 3C, we used color-coding to represent the activity of five hidden layers after dimensionality reduction. Each dot on the plot corresponds to one test sentence. Only phonemes are labeled because each syllable in our test sentences contains the same vowels (see Table S1). The results demonstrate that the phoneme layer effectively distinguishes different phonemes, while the higher linguistic layers do not. We believe these findings provide evidence that different layers capture distinct linguistic information. Additionally, we computed the correlation coefficients between each pair of linguistic predictors, as shown in Figure 3B. We think this analysis serves a similar purpose to computing the mutual information between pairs of hidden-layer activities for our constructed sentences. Furthermore, the mTRF results based on rate models of the linguistic features we presented earlier align closely with the regression results using the hidden-layer activity from our HM-LSTM model. This further supports the conclusion that our model successfully captures relevant information across these linguistic levels. We have added the clarification on p.5 of the manuscript.

      The formulation of the regression analysis is somewhat unclear. The choice of sentence offsets as the anchor point for the temporal analysis, and the focus on the [-100ms, +300ms] interval, needs further justification. Since EEG measures underlying neural activity in near real-time, it is expected that lower-level acoustic information, which is relatively transient, such as phonemes and syllables, would be distributed throughout the time course of the entire sentence. It is not evident if this limited time window effectively captures the neural responses to the entire sentence, especially for lower-level linguistic features. A more comprehensive analysis covering the entire time course of the sentence, or at least a longer temporal window, would provide a clearer understanding of how different linguistic units are processed over time. Additionally, explaining the rationale behind choosing this specific time window and how it aligns with the temporal dynamics of speech processing would enhance the clarity and validity of the regression analysis.

      Thank you for pointing this out. We chose this time window as lexical or phrasal processing typically occurs 200 ms after stimulus offsets (Bemis & Pylkkanen, 2011; Goldstein et al., 2022; Li et al., 2024; Li & Pylkkänen, 2021). Additionally, we included the -100 to 200 ms time period in our analysis to examine phoneme and syllable level processing (e.g., Gwilliams et al., 2022). Using the entire sentence duration was not feasible, as the sentences in the stimuli vary in length, making statistical analysis challenging. Additionally, since the stimuli consist of continuous speech, extending the time window would risk including linguistic units from subsequent sentences. This would introduce ambiguity as to whether the EEG responses correspond to the current or the following sentence. We have added this clarification on p.12 of the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      As I mentioned, I think the OSF repo needs to be changed to give anyone access. I would recommend pursuing the lines of thought I mentioned in the public review to make this study complete and to allow it to fit into the already existing literature to facilitate comparisons.

      Yes the OSF folder is now public. We have made revisions following all reviewers’ suggestions.

      There are some typos in figure labels, e.g. 2B.

      Thank you for pointing it out! We have now revised the typo in Figure 2B.

      Reviewer #2 (Recommendations For The Authors):

      (1) I was able to access all of the audio files and code for the study, but no EEG data was shared in the OSF repository. Unless there is some ethical and/or legal constraint, my understanding of eLife's policy is that the neural data should be made publicly available as well.

      The preprocessed EEG data in .npy format in the OSF repository. 

      (2) The line-plots in Figures 4B,5B, and 6B have very similar colours. They would be easier to interpret if you changed the line appearance as well as the colours. E.g., dotted line for hearingimpaired listeners, thick line for normal-hearing.

      Thank you for the suggestion! We have now used thicker lines for normal-impaired listeners in all our line plots.

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors may consider presenting raw event-related potentials (ERPs) or spatiotemporal response profiles before delving into the more complex regression encoding analysis. This would provide a clearer foundational understanding of the neural activity patterns. For example, it is not clear if the main claims, such as the neural activity in the normal-hearing group encoding phonetic information in attended speech better than in unattended speech, are directly observable. Showing ERP differences or spatiotemporal response pattern differences could support these claims more straightforwardly. Additionally, training pattern classifiers to test if different levels of information can be decoded from EEG activity in specific groups could provide further validation of the findings.

      We have now included results from more traditional mTRF analyses using phoneme, syllable, word, phrase, and sentence rates as baseline models (see p.7 of the manuscript and Figure S3). The results show similar patterns to those observed in our current analyses. While we agree that classification analyses would be very interesting, our regression analyses have already demonstrated distinct EEG patterns for each linguistic level. Consequently, classification analyses would likely yield similar results unless a different method for representing linguistic information at these levels is employed. To the best of our knowledge, no other computational model currently exists that can simultaneously represent these linguistic levels.

      (2) Is there any behavioral metric suggesting that these hearing-impaired participants do have deficits in comprehending long sentences? The self-rated intelligibility is useful, but cannot fully distinguish between perceiving lower-level phonetic information vs longer sentence comprehension.

      In the current study, we included only self-rated intelligibility tests. We acknowledge that this approach might not fully distinguish between the perception of lower-level phonetic information and higher-level sentence comprehension. However, it remains unclear what type of behavioral test would effectively address this distinction. Furthermore, our primary aim was to use the behavioral results to demonstrate that our hearing-impaired listeners experienced speech comprehension difficulties in multi-talker environments, while relying on the EEG data to investigate comprehension challenges at various linguistic levels.

      Minor:

      (1) Page 2, second line in Introduction, "Phonemes occur over ..." should be lowercase.

      According to APA format, the first word after the colon is capitalized if it begins a complete sentence (https://blog.apastyle.org/apastyle/2011/06/capitalization-after-colons.html). Here

      the sentence is a complete sentence so we used uppercase for “phonemes”.

      (2) Page 8, second paragraph "...-100ms to 100ms relative to sentence onsets", should it be onsets or offsets?

      This is typo and it should be offsets. We have now revised it.

      References

      Bemis, D. K., & Pylkkanen, L. (2011). Simple composition: An MEG investigation into the comprehension of minimal linguistic phrases. Journal of Neuroscience, 31(8), 2801– 2814.

      Gao, C., Li, J., Chen, J., & Huang, S. (2024). Measuring meaning composition in the human brain with composition scores from large language models. In L.-W. Ku, A. Martins, & V. Srikumar (Eds.), Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 11295–11308). Association for Computational Linguistics.

      Goldstein, A., Zada, Z., Buchnik, E., Schain, M., Price, A., Aubrey, B., Nastase, S. A., Feder, A., Emanuel, D., Cohen, A., Jansen, A., Gazula, H., Choe, G., Rao, A., Kim, C., Casto, C., Fanda, L., Doyle, W., Friedman, D., … Hasson, U. (2022). Shared computational principles for language processing in humans and deep language models. Nature Neuroscience, 25(3), Article 3.

      Gwilliams, L., King, J.-R., Marantz, A., & Poeppel, D. (2022). Neural dynamics of phoneme sequences reveal position-invariant code for content and order. Nature Communications, 13(1), Article 1.

      Huth, A. G., de Heer, W. A., Griffiths, T. L., Theunissen, F. E., & Gallant, J. L. (2016). Natural speech reveals the semantic maps that tile human cerebral cortex. Nature, 532(7600), 453–458.

      Li, J., Lai, M., & Pylkkänen, L. (2024). Semantic composition in experimental and naturalistic paradigms. Imaging Neuroscience, 2, 1–17.

      Li, J., & Pylkkänen, L. (2021). Disentangling semantic composition and semantic association in the left temporal lobe. Journal of Neuroscience, 41(30), 6526–6538.

      Maris, E., & Oostenveld, R. (2007). Nonparametric statistical testing of EEG- and MEG-data. Journal of Neuroscience Methods, 164(1), 177–190.

      Schmitt, L.-M., Erb, J., Tune, S., Rysop, A. U., Hartwigsen, G., & Obleser, J. (2021). Predicting speech from a cortical hierarchy of event-based time scales. Science Advances, 7(49), eabi6070.

      Schrimpf, M., Blank, I. A., Tuckute, G., Kauf, C., Hosseini, E. A., Kanwisher, N., Tenenbaum, J. B., & Fedorenko, E. (2021). The neural architecture of language: Integrative modeling converges on predictive processing. Proceedings of the National Academy of Sciences, 118(45), e2105646118.

      Sugimoto, Y., Yoshida, R., Jeong, H., Koizumi, M., Brennan, J. R., & Oseki, Y. (2024). Localizing Syntactic Composition with Left-Corner Recurrent Neural Network Grammars. Neurobiology of Language, 5(1), 201–224.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      The study presents some useful findings on Mendelian randomization-phenome-wide association, with BMI associated with health outcomes, and there is a focus on sex differences. Although there are some solid phenotype and genotype data, some of the data are incomplete and could be better presented, perhaps benefiting from more rigorous approaches. Confirmation and further assessment of the observed sex differences will add further value.

      Thank you for your positive comments. We have revised the analysis based on your feedback and that from the two reviewers. Specifically, we implemented a stricter multiple testing correction approach, improved the figures, included additional figures in the Supplementary Materials, considered the sex differences more rigorously and reported them in more detail. A comprehensive description of the revisions is provided below.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study uses information from the UK Biobank and aims to investigate the role of BMI on various health outcomes, with a focus on differences by sex. They confirm the relevance of many of the well-known associations between BMI and health outcomes for males and females and suggest that associations for some endpoints may differ by sex. Overall their conclusions appear supported by the data. The significance of the observed sex variations will require confirmation and further assessment.

      Strengths:

      This is one of the first systematic evaluations of sex differences between BMI and health outcomes. The hypothesis that BMI may be associated with health differentially based on sex is relevant and even expected. As muscle is heavier than adipose tissue, and as men typically have more muscle than women, as a body composition measure BMI is sometimes prone to classifying even normal weight/muscular men as obese, while this measure is more lenient when used in women. Confirmation of the many well-known associations is as expected and attests to the validity of their approach. Demonstration of the possible sex differences is interesting, with this work raising the need for further study.

      Thank you for your valuable comments. We are grateful for the time and effort you have devoted to reviewing our manuscript. We have strengthened our paper by adding your insightful comment about the rationale for sex-specific analysis to the introduction:

      Weaknesses:

      (1) Many of the statistical decisions appeared to target power at the expense of quality/accuracy. For example, they chose to use self-reported information rather than doctor diagnoses for disease outcomes for which both types of data were available.

      Thank you for your valuable comments. We apologize for the lack of clarity in our original description of the phenotypes. Information about health in the UK Biobank was obtained at baseline from tests, measurements and self reports. Subsequently comprehensive data linkage to hospital admissions, death registries and cancer registries was implemented. However, data linkage to primary care data, such as doctor diagnoses, has not been comprehensively implemented for the UK Biobank, possibly for logistic reasons. Doctor diagnoses are only available for about half the cohort, (https://www.ukbiobank.ac.uk/enable-your-research/about-our-data/health-related-outcomes-data). So, we used self-reported diagnoses because they are substantially more comprehensive than the doctor diagnoses. We have explained this point by making the following change to the Methods:

      “Where attributes were available from both self-report and doctor diagnosis, we used self-reports. This is because comprehensive record linkage to doctor diagnoses has not yet been fully implemented for the UK Biobank, so information from doctor diagnoses may not fully represent the broader UK Biobank cohort.”

      (2) Despite known problems and bias arising from the use of one sample approach, they chose to use instruments from the UK Biobank instead of those available from the independent GIANT GWAS, despite the difference in sample size being only marginally greater for UKB for the context. With the way the data is presented, it is difficult to assess the extent to which results are compatible across approaches.

      Thank you for your comments. We agree completely about the issues with a one sample approach, please accept our apologies for not explaining our rationale. The sex-specific GIANT GWAS study is similar in size to the UK Biobank GWAS. However, the sex-specific GIANT GWAS is much less densely genotyped (~2,5 million variants) than the sex-specific UK Biobank GWAS (~10 million variants), so has less power, hence our use of the UK Biobank. To make this clear, we have added the number of variants in each study to the method section. Nevertheless, we also repeated analysis using sex-specific GIANT, as now given in the methods by making the following change

      We amended the description in the first paragraph of the results section:

      “Initial analysis using sex-specific BMI from GIANT yielded similar estimates as when using sex-specific BMI from the UK Biobank but had fewer SNPs resulting in wider confidence intervals (S Table 1) and fewer significant associations (S Table 1). Analysis using sex-combined GIANT yielded more significant associations but lacks granularity, so we presented the results obtained using sex-specific BMI from the UK Biobank.”

      In the discussion we also made the following changes:

      “Tenth, although this study primarily utilized sex-specific BMI, we also conducted analyses using overall BMI from GIANT including the UK Biobank, which gave a generally similar interpretation (S Table 1). Using sex-specific BMI from the UK Biobank and GIANT may lead to lower statistical power than using overall population BMI but allows for the detection of traits that are affected differently by BMI by sex. Including findings from the overall population BMI from sex-combined GIANT (S Table 1) makes the results more comparable to previous similar studies.”

      (3) The approach to multiple testing correction appears very lenient, although the lack of accuracy in the reporting makes it difficult to know what was done exactly. The way it reads, FDR correction was done separately for men, and then for women (assuming that the duplication in tests following stratification does not affect the number of tests). In the second stage, they compared differences by sex using Z-test, apparently without accounting for multiple testing.

      Thank you, we have accounted for multiple comparisons when considering differences by sex and have made corresponding changes. Specifically, in the methods, we changed:

      “We obtained differences by sex using a z-test (Paternoster et al., 1998), which as recommended was on a linear scale for dichotomous outcomes (Knol et al., 2007; Rothman, 2008), then we identified which ones remained after allowing for false discovery”

      We have made the following changes to the results section:

      “We found significant differences by sex in the associations of BMI with 105 health-related attributes (p-value<0.05); 46 phenotypes remained after allowing for false discovery (Table 1). Of these 46 differences most (35) were in magnitude but not direction, such as for SHBG, ischemic heart disease, heart attack, and facial aging, while 11 were directionally different.

      Notably, BMI was more strongly positively associated with myocardial infarction, major coronary heart disease events, ischemic heart disease, heart attack, and facial aging in men than in women. BMI was more strongly positively associated with diastolic blood pressure, and hypothyroidism/myxoedema in women than men. BMI was more strongly inversely associated with LDL-c, hay fever and allergic rhinitis in men than women. BMI was more strongly inversely associated with SHBG in women than men.

      BMI was inversely associated with ApoB, iron deficiency anemia, hernia, and total testosterone in men, while positively associated with these traits in women (Table 1). BMI was inversely associated with sensitivity/hurt feelings, and ever seeking medical advice for nerves, anxiety, tension, or depression in men. However, BMI was positively associated with sensitivity/hurt feelings and ever seeking medical advice for these same issues in women. BMI was positively associated with muscle or soft tissue injuries and haemorrhage from respiratory passages in men, whilst inversely associated with these traits in women.”

      We have correspondingly amended the discussion to reflect these changes by adding:

      “Whether the difference in ischemic heart disease rates between men and women that emerged in the US and the UK the late 19th century (Nikiforov & Mamaev, 1998) is explained by rising BMI remains to be determined.”

      (4) Presentation lacks accuracy in a few places, hence assessment of the accuracy of the statements made by the authors is difficult.

      Thank you, we have revised the whole manuscript in order to improve clarity.

      (5) Conclusion (Abstract) "These findings highlight the importance of retaining a healthy BMI" is rather uninformative, especially as they claim that for some attributes the effects of BMI may be opposite depending on sex/gender.

      Thank you for your comments. We have changed the conclusion of the abstract, as given below:

      “Our study revealed that BMI might affect a wide range of health-related attributes and also highlights notable sex differences in its impact, including opposite associations for certain attributes, such as ApoB; and stronger effects in men, such as for cardiovascular diseases. Our findings underscore the need for nuanced, sex-specific policy related to BMI to address inequities in health.”.

      We have changed the Impact statement, as given below:

      “BMI may affect a wide range of health-related attributes and there are notable sex differences in its impact, including opposite associations for certain attributes, such as ApoB; and stronger effects in men, such as for cardiovascular diseases. Our findings underscore the need for nuanced, sex-specific policy related to BMI.”

      We have changed the conclusion of the paper, as given below:

      “Our contemporary systematic examination found BMI associated with a broad range of health-related attributes. We also found significant sex differences in many traits, such as for cardiovascular diseases, underscoring the importance of addressing higher BMI in both men and women possibly as means of redressing differences in life expectancy. Ultimately, our study emphasizes the harmful effects of obesity and the importance of nuanced, sex-specific policy related to BMI to address inequities.in health.”

      Reviewer #2 (Public review):

      Summary:

      In this present Mendelian randomization-phenome-wide association study, the authors found BMI to be positively associated with many health-related conditions, such as heart disease, heart failure, and hypertensive heart disease. They also found sex differences in some traits such as cancer, psychological disorders, and ApoB.

      Strengths:

      The use of the UK-biobank study with detailed phenotype and genotype information.

      Thank you for your valuable comments. We are grateful for the time and effort you have devoted to reviewing our manuscript.

      Weaknesses:

      (1) Previous studies have performed this analysis using the same cohort, with in-depth analysis. See this paper: Searching for the causal effects of body mass index in over 300,000 participants in UK Biobank, using Mendelian randomization. https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.10079i51

      Thank you for your valuable comments. We checked the paper carefully. It gives sex-specific estimates when the outcome was assessed in different ways in men and women, for example the question about number of children was asked in terms of live births in women and number of children fathered in men. In addition, for some significant findings, the authors investigated differences by sex. However, the paper did not use sex-specific BMI or sex-specific outcomes systematically. We have added this paper to our introduction and amended the text to explain the novelty of our study compared to previous studies.

      “Previous phenome-wide association studies using MR (MR-PheWASs) have identified impacts of sex-combined BMI on endocrine disorders, circulatory diseases, inflammatory and dermatological conditions, some biomarkers and feelings of nervousness (Hyppönen et al., 2019; Millard et al., 2015; Millard et al. 2019), but did not systematically use sex-specific BMI for the exposure or sex-specific outcomes.”

      (2) I believe that the authors' claim, "To our knowledge, no sex-specific PheWAS has investigated the effects of BMI on health outcomes," is not well supported. They have not cited a relevant paper that conducted both overall and sex-stratified PheWAS using UK Biobank data with a detailed analysis. Given the prior study linked above, I am uncertain about the additional contributions of the present research.

      Thank you for your valuable comments, please accept our apologies for this oversight. As explained above, we have checked very carefully. There are three previous PheWAS for BMI, Hyppönen et al., 2019, Millard et al., 2015 and Millard et al. 2019. Hyppönen et al., 2019 and Millard et al., 2015 are not sex-specific. Millard et al. 2019 used sex-combined instruments, but some sex-specific outcomes, when the questions were asked sex-specifically, such as age at puberty asked as “age when periods started (menarche)” in women and “relative age of first facial hair” and “relative age voice broke” in men. When they found a factor significantly associated with BMI, they sometimes analyze it further including sex-specific analysis, but they did not do the analysis systematically for men and women with sex-specific BMI and sex-specific outcomes. We have amended the introduction to clarify this point.

      “To our knowledge, no sex-specific PheWAS has investigated the effects of BMI on health outcomes (Hyppönen et al., 2019; Millard et al., 2015; Millard et al. 2009). To address this gap, we conducted a sex-specific PheWAS, using the largest available sex-specific GWAS of BMI, to explore the impact of sex-specific BMI on sex-specific health-related attributes”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Presentation, accuracy, and referencing:

      (1) The quality of the English language needs to be checked, including that all sentences carry all components required (including verbs).

      We thank the reviewer for this suggestion. The manuscript has undergone language editing by a native English-speaker, with particular attention to grammatical completeness (including verb consistency and sentence structure). We have also clarified ambiguities and inconsistencies in terms pointed out by the native English speakers. All revisions have been implemented in the updated manuscript.

      (2) The accuracy of statements needs to be checked. For example, in lines 82-83 it is not true that 2015/2019 was 'before the advent of large-scale GWAs studies". In the context of the above in lines 83-85, how can reference be made to a study published in 2020 calling that 'previous' MR studies and how a trial published in 2016 is 'recent'? Please revise, and please also check the manuscript for any other issues with accuracy of this kind.

      We thank the reviewer for this suggestion. We have checked the manuscript and revised these sentences to be clearer, by making the following change.

      “Previous phenome-wide association studies using MR (MR-PheWASs) have identified impacts of sex-combined BMI on endocrine disorders, circulatory diseases, inflammatory and dermatological conditions, some biomarkers and feelings of nervousness (Hyppönen et al., 2019; Millard et al., 2015; Millard et al. 2019), but did not systematically use sex-specific BMI for the exposure or sex-specific outcomes. Previous MR studies and trials of incretins have expanded our knowledge about a broad range of effects of BMI (Larsson et al., 2020; Marso et al., 2016).”

      (3) The adequacy of referencing will need to be checked, e.g. line 136 "as recommended by UK biobank" is vague and needs to be referenced.

      We thank the reviewer for this suggestion. We have added citations.

      “We categorized attributes as age at recruitment, physical measures, lifestyle and environmental, medical conditions, operations, physiological factors, cognitive function, health and medical history, sex-specific factors, blood assays and urine assays, based on the UK Biobank categories (https://biobank.ndph.ox.ac.uk/ukb/cats.cgi).”

      (4) The accurate use of terminology needs to be checked. For example, BMI is a measure of adiposity, while high BMI (typically >30) is used to index obesity.

      We thank you for your comments. We have changed the descriptions into “overweight/obesity” throughout.

      (5) Figure 1, Please check that complete information is given for 'selection criteria' and that the rationale for all information included is clear. For example, it is currently unclear what is the distinction between the bottom two sections which both present a number of features included in the analyses? Also, the Box detailing exclusion of 3585 variables does not give the criteria for these exclusions. Please add.

      Thank you for your comments. We have represented and revised Figure 1. Specifically, we have revised the bottom two sections to give each reason for exclusion and the number excluded for that reason. The updated “Excluded: 3,572 phenotypes, for the reason listed below:” box now contains bullet-points giving each reason for exclusion in the box (e.g. age of certain diseases/disorders onset: 26, alcohol: 56).

      (6) Figure 4, does not look to be of typical publication quality.

      We thank you for your comments. We have used different colors to make it smaller and more readable. Please see Table 1.

      Analyses:

      (1) As it stands, it is very difficult for a reader to confirm the conclusion that similar findings are obtained both when using instruments from the UKB and GIANT based on data presented (Stable 1 and 2). I suggested two things.

      a) Organise stable 1 and 2 by significance and category, with separation by highlighting for those which are significant under correction. I would consider merging these two tables, such that it would be easy for the reader to make the comparisons side by side. Consider presenting separate tables for the analyses for women and men.

      We thank you for your comments. We have followed your helpful advice and merged S Table 1 and S Table 2 into S Table 1. Furthermore, we have also merged S Table 5 to S Table 1.

      b) In Stable 3, please add information from related comparisons using the GIANT instruments. To support the authors' claim that associations are similar, but only the precision of estimation differed, you could consider adding information for numbers of associations for those that are directionally consistent and which have an association at least under nominal significance'. For associations where this does not hold, I would refrain from making a claim that the results are not affected by the choice of instrument (or biases relating to the analysis conducted).

      We thank you for your comments. Among 42 significant sex-specific associations identified in both the UK Biobank and the sex-specific GIANT consortium for men, all showed consistent directions of effect. Similarly, for women, all of the 45 significant associations exhibited consistent directions for UK Biobank compared with GIANT instruments.

      In the sex-specific UK Biobank, there are 203 significant associations in men, and 232 significant associations in women. We have added: in the sex-specific GIANT, there are 46 significant associations in men, and 84 significant associations in women. In the sex-combined GIANT, there are 246 significant associations in men, and 276 significant associations in women. We have provided all this information in S Table 2.

      We added the following descriptions at the end of the results section:

      “Of the 42 significant sex-specific associations identified in both the UK Biobank and the sex-specific GIANT consortium for men, all were directionally consistent. Similarly, for women, all 45 such significant associations were directionally consistent.

      We amended the following descriptions in the first paragraph of the results section:

      “Initial analysis using sex-specific BMI from the GIANT yielded similar estimates as when using sex-specific BMI from the UK Biobank but had fewer SNPs resulting in wider confidence intervals (S Table 1) and fewer significant associations (S Table 2). Analysis using sex-combined GIANT yielded more significant associations but lacks granularity, so we presented the results obtained using sex-specific BMI from the UK Biobank.”

      In the methods, we changed:

      “We obtained differences by sex using a z-test (Paternoster et al., 1998), which as recommended was on a linear scale for dichotomous outcomes (Knol et al., 2007; Rothman, 2008), then we identified which ones remained after allowing for false discovery.”

      We have made the following changes to the results section:

      “We found significant differences by sex in the associations of BMI with 105 health-related attributes (p-value<0.05); 46 phenotypes remained after allowing for false discovery (Table 1). Of these 46 differences most (35) were in magnitude but not direction, such as for SHBG, ischemic heart disease, heart attack, and facial aging, while 11 were directionally different.

      Notably, BMI was more strongly positively associated with myocardial infarction, major coronary heart disease events, ischemic heart disease, heart attack, and facial aging in men than in women. BMI was more strongly positively associated with diastolic blood pressure, and hypothyroidism/myxoedema in women than men. BMI was more strongly inversely associated with LDL-c, hay fever and allergic rhinitis in men than women. BMI was more strongly inversely associated with SHBG in women than men.

      BMI was inversely associated with ApoB, iron deficiency anemia, hernia, and total testosterone in men, while positively associated with these traits in women (Table 1). BMI was inversely associated with sensitivity/hurt feelings, and ever seeking medical advice for nerves, anxiety, tension, or depression in men. However, BMI was positively associated with sensitivity/hurt feelings and ever seeking medical advice for these same issues in women. BMI was positively associated with muscle or soft tissue injuries and haemorrhage from respiratory passages in men, whilst inversely associated with these traits in women.”

      (2) It is not clear what statistical criteria were used to determine sex differences, and the strategy/presentation should be clarified. In lines 229-231, it is implied that the 'significance' in one gender, but not in the other is used to indicate a difference. However, 'comparison of p-values' is not a valid statistical approach, and a more formal test (accounting for multiple testing would be warranted). It may be that a systematic approach has been implemented, but please check that it is adequately and accurately described to the reader.

      Please accept our apologies for being unclear. Multiple comparisons are for independent phenotypes however, here, some phenotypes cannot be independent, therefore, using multiple comparisons in men and women separately is quite strict. We added multiple comparisons for the assessment of sex-differences, which is now given in Table 1. Initially, there were 105 significant associations (p value for sex-difference<0.05) (Table 1), and 46 associations remained after FDR correction (Table 1).  

      Furthermore, we have made additional minor changes to clarify the wording.

      Knol, M. J., van der Tweel, I., Grobbee, D. E., Numans, M. F., & Geerlings, M. I. (2007). Estimating interaction on an additive scale between continuous determinants in a logistic regression model. Int J Epidemiol, 36(5), 1111-1118.

      Nikiforov, S. V., & Mamaev, V. B. (1998). The development of sex differences in cardiovascular disease mortality: a historical perspective. Am J Public Health, 88(9), 1348-1353. https://doi.org/10.2105/ajph.88.9.1348

      Paternoster, R., Brame, R., Mazerolle, P., & Piquero, A. (1998). Using the correct statistical test for the equality of regression coefficients. Criminology, 36(4), 859-866.

      Rothman, K. (2008). Greenland S, Lash TL (ed.). Modern Epidemiology. In: Philadelphia: Lippincott Wolliams & Wilkins.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary:

      The study identifies two types of activation: one that is cue-triggered and nonspecific to motion directions, and another that is specific to the exposed motion directions but occurs in a reversed manner. The finding that activity in the medial temporal lobe (MTL) preceded that in the visual cortex suggests that the visual cortex may serve as a platform for the manifestation of replay events, which potentially enhance visual sequence learning.

      Evaluations:

      Identifying the two types of activation after exposure to a sequence of motion directions is very interesting. The experimental design, procedures and analyses are solid. The findings are interesting and novel.

      In the original submission, it was not immediately clear to me why the second type of activation was suggested to occur spontaneously. The procedural differences in the analyses that distinguished between the two types of activation need to be a little better clarified. However, this concern has been satisfactorily addressed in the revision.

      We thank the reviewer for his/her positive evaluation and thoughtful comments. 

      Reviewer #2 (Public review):

      This paper shows and analyzes an interesting phenomenon. It shows that when people are exposed to sequences of moving dots (That is moving dots in one direction, followed by another direction etc.), that showing either the starting movement direction, or ending movement direction causes a coarsegrained brain response that is similar to that elicited by the complete sequence of 4 directions. However, they show by decoding the sensor responses that this brain activity actually does not carry information about the actual sequence and the motion directions, at least not on the time scale of the initial sequence. They also show a reverse reply on a highly-compressed time scale, which is elicited during the period of elevated activity, and activated by the first and last elements of the sequence, but not others. Additionally, these replays seem to occur during periods of cortical ripples, similar to what is found in animal studies.

      These results are intriguing. They are based on MEG recordings in humans, and finding such replays in humans is novel. Also, this is based on what seems to be sophisticated statistical analysis. The statistical methodology seems valid, but due to its complexity it is not easy to understand. The methods especially those described in figures 3 and 4 should be explained better.  

      We thank the reviewer’s detailed evaluation. As suggested, we have further revised the Methods and Results sections, particularly the descriptions related to Figures 3 and 4, to enhance clarity. Please see the revisions highlighted in red in the revised manuscript.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The most important results here are in Figure 4, and they rely on methods explained in Figure 3. Figure 4 and the results in the figure are confusing.

      What is the red bar in 4B,E. What are the units of the Y axis in figure 4B,E?

      Does sequenceness have units? How do we interpret these magnitudes apart from the line of statistical significance? Shouldn't there be two lines, one for forward replay and the other for backward replay rather than a single line with positive and negative values? The term sequnceness is defined in figure 3, and is key. The replayed sequence in figure 4A,D seems to last about 120 ms.

      What is the meaning of having significance only within a window of 28-36 ms?

      We thank the reviewer’s careful reading and insightful comments. We apologize for the lack of clarity regarding these details in the previous version. As mentioned above, we have revised the Methods and Results sections to enhance clarity throughout the manuscript. For convenience, we provide detailed explanations addressing the specific points raised by the reviewer below.

      First, the red bars in Figures 4B and 4E indicate the lags when the evidence of sequenceness surpassed the statistical significance threshold, as determined by permutation testing. We have now explicitly clarified this in the revised figure captions.

      Second, sequenceness doesn’t have units. It corresponds to the regression coefficient (β) obtained from the second-level GLM in the TDLM framework. Specifically, in the first step of TDLM, we constructed an empirical transition matrix that quantifies the evidence for all possible transitions (e.g., 0° → 90°) at each time lag (Δt). In the second step, we evaluated the extent to which each model transition matrix (e.g., forward or backward transitions) predicts the empirical transition matrix at each Δt, yielding second-level β values. Sequenceness is defined as the difference between the β values for the forward and backward transition models, reflecting the relative strength and directionality of sequential replay. As it is derived from regression coefficients, sequenceness is inherently a unitless measure.

      Regarding the interpretation of sequenceness magnitudes beyond statistical significance, the β values reflect the extent to which the model transition matrix explains variance in the empirical transition matrix. While larger β values suggest stronger sequenceness, absolute magnitudes are influenced by various factors, such as between-participant noise. Therefore, the key criterion for interpreting these values is whether they surpass permutationbased significance thresholds, which indicate that the observed sequenceness is unlikely to have occurred by chance.

      Third, as the reviewer correctly pointed out, we initially computed two separate regression lines, one for forward replay and the other for backward replay. We then defined sequenceness as the contrast between the forward and backward replay (forward minus backward). This contrast approach is commonly used in previous studies to remove between-participant variance in the sequential replay per se, which may arise due to variability in task engagement or measurement sensitivity (Liu et al., 2021; Nour et al., 2021).

      Finally, regarding the duration of replay events, the example sequences shown in Figures 4A and 4D indeed span about 120 ms in total. However, the time lag (Δt) between successive reactivation peaks within these sequences is about 30 ms. This is in line with the findings shown in Figures 4B and 4E, where statistical significance is observed at a time lag window of 28 – 36 ms on the x-axis. It is important to note that the x-axis in these plots represents the time lag (Δt) between sequential reactivations, rather than absolute time.

      We hope these clarifications address the reviewer’s concerns, and we have revised the manuscript accordingly to make these points clearer to readers.

      The methods here are not simple and not simple to explain. The new version is easier to understand. From the new version it seems that the methodology is sound. It should be still clarified and better explained.

      We have carefully revised the manuscript to better explain the methodology. We appreciate the reviewer’s feedback, which is valuable in improving the clarity of our work.

      Now that I understand what they mean by decoding probability, I think that this term is confusing or even misleading. The decoding accuracy is the probability that the direction of motion classification was correct. It seems the so-called decoding probability is value of the logistic regression after normalizing the sum to 1. If this is a standard term it can probably be kept, if not another term would be better.

      Thank you for the reviewer’s comment. We agree that the term decoding probability may initially seem confusing. However, decoding probability is a commonly used term in the neural decoding literature, particularly in human studies (e.g., Liu et al., 2019; Nour et al., 2021; Turner et al., 2023). To maintain consistency with previous work, we have kept this term in the manuscript. We appreciate the opportunity to clarify this point.

      References

      Liu, Y., Dolan, R. J., Higgins, C., Penagos, H., Woolrich, M. W., Ólafsdóttir, H. F., Barry, C., Kurth-Nelson, Z., & Behrens, T. E. (2021). Temporally delayed linear modelling (TDLM) measures replay in both animals and humans. eLife, 10, e66917. https://doi.org/10.7554/eLife.66917

      Liu, Y., Dolan, R. J., Kurth-Nelson, Z., & Behrens, T. E. J. (2019). Human Replay Spontaneously Reorganizes Experience. Cell, 178(3), 640-652.e14. https://doi.org/10.1016/j.cell.2019.06.012

      Nour, M. M., Liu, Y., Arumuham, A., Kurth-Nelson, Z., & Dolan, R. J. (2021). Impaired neural replay of inferred relationships in schizophrenia. Cell, 184(16), 4315-4328.e17. https://doi.org/10.1016/j.cell.2021.06.012

      Turner, W., Blom, T., & Hogendoorn, H. (2023). Visual Information Is Predictively Encoded in Occipital Alpha/Low-Beta Oscillations. Journal of Neuroscience, 43(30), 5537–5545. https://doi.org/10.1523/JNEUROSCI.0135-23.2023

    1. Author response:

      We thank the editors and the reviewers for their valuable comments and for taking the time to evaluate our manuscript.

      Answers to Reviewer 1:

      (1) The core contribution of our method is that it learns meaningful spatiotemporal embeddings directly from image data without requiring pose estimation or eigenworm-based features as input. The learned embedding space can serve as a foundation for downstream tasks such as behavioral classification, clustering, or anomaly detection, further supporting its utility beyond visualization through eigenworm-derived features. Here we use the Tierpsy-derived features for latent space interpretation and for validation that our approach does indeed encode meaningful postural information. Additionally, without any Tierpsy-calculated features users can still color embeddings by known metadata like mutation or age and compare different strains to each other. 

      (2) The numbers shown in Fig. 2.3 are illustrative placeholders intended to conceptually represent a vector of behavioral features. They do not correspond to any specific measurements or carry intrinsic meaning. We agree that this may lead to confusion, and we will clarify this in the revised manuscript.

      (3) The visualizations in Figs. 4 (b) and (c) show the embeddings of sequences of behavior, rather than individual poses. Therefore, motion-related features such as speed are related to temporal patterns in those sequences rather than static postures. The color overlays reflect average motion characteristics (e.g., speed) of short behavior clips projected into the embedding space, rather than being directly linked to any single frame or pose.

      Answers to Reviewer 2:

      (1) In the abstract, our use of the term "unbiased" refers specifically to the avoidance of human-generated bias through feature engineering—i.e., the model does not rely on handcrafted features or predefined pose representations – the representations are based on data only. However, we agree that the model is still subject to dataset biases and will rectify this in the revised manuscript.

      (2) The worm images are rotated to a common vertical orientation to remove orientation as a source of variability in the input. This ensures that the model focuses on learning pose and behavioral dynamics rather than arbitrary head-tail or angular positioning. While data augmentation could in theory account for this variability, we found in our preliminary experiments that applying this preprocessing step led to more stable and interpretable embeddings.

      (3) We agree that simplifying the technical explanations would enhance the manuscript’s accessibility. In the revised version, we will briefly introduce contrastive learning in a less technical language.

      (4) The gray points in Fig. 3a represent frames that Tierpsy could not resolve, primarily due to coiled, self-intersecting, or overlapping worm postures as Tierpsy uses skeletonization to estimate the centerline. This approach can fail if kind of challenging elements are part of the image.

      (5) We appreciate this suggestion and consider it for a revised version of the manuscript.

      (6) Although it may seem intuitive for highly bent (red) poses to lie near coiled (gray) ones in the embedding space, the clustering pattern observed reflects how the network organizes pose information. The red/orange cluster consists of distinguishable bent poses that are visually distinct and consistently separable from other postures. In contrast, the greenish and blueish poses are less strongly bent and may share more visual overlap with the unresolved (gray) images.

      (7) The overlap occurs because some highly bent or coiled worms can still be (partially) resolved by Tierpsy, depending on specific pose conditions (e.g., head and tail not touching, not self-overlapping). However, Tierpsy fails to consistently resolve such frames. We will describe these cases in more detail in the revised manuscript.

      (8) Thank you, we agree this claim needs to be better supported and will develop it in the revision.

      (9) To support this statement we mainly visualized the respective sequences embedded in this area of the embedding space and found that it mostly consists of common behaviors such as forward locomotion. 

      (10) We agree that interpretability is important and plan to include additional figures quantifications of the embedding space using more basic Tierpsy features.

      (11) Fig. 5a is indeed based solely on N2 animals. In the revised manuscript we will include quantitative measures of behavioral variability and its change with age.

      (12) We appreciate this suggestion and consider it for a revised version

      (13) We agree this would be a valuable analysis. However, our current dataset primarily includes aging data for N2 animals. We acknowledge this limitation and consider adding more strains for future work.

      (14) We will include links to our source code in the revised manuscript

      Answers to Reviewer 3:

      (1-2) Our current method is agnostic to head-tail orientation, which indeed restricts the ability to distinguish behaviors that rely on directional cues. We made this design choice as we believe that correctly identifying head/tail orientation can be a challenging task that may introduce additional biases or fail in difficult imaging conditions. However, we fully agree that integrating directional information would improve behavioral resolution, and this is a natural extension of our current framework. In future work, we aim to incorporate head-tail disambiguation.

      (3) We explicitly designed our preprocessing and training pipeline to encourage size invariance, for example by resizing individuals to a consistent scale, as the focus of our work is to encode posture and movement only. However, we acknowledge that absolute size information is lost in this process, which can be informative for distinguishing genotypes or age-related changes.

      (4) We agree that a direct quantitative comparison between our embedding-based representations and skeleton-based feature sets would strengthen the paper. Our current focus was to assess whether meaningful behavioral features could be learned from a skeleton-free representation.

    1. Author response:

      Reviewer 1:

      (1) In general, the representation of target and distractor processing is a bit of a reach. Target processing is represented by SSVEP amplitude, which is most likely going to be related to the contrast of the dots, as opposed to representing coherent motion energy, which is the actual target. These may well be linked (e.g., greater attention to the coherent motion task might increase SSVEP amplitude), but I would call it a limitation of the interpretation. Decoding accuracy of emotional content makes sense as a measure of distractor processing, and the supplementary analysis comparing target SSVEP amplitude to distractor decoding accuracy is duly noted.

      We agree with the reviewer. This is certainly a limitation and will be acknowledged as such in the revised manuscript.

      (2) Comparing SSVEP amplitude to emotional category decoding accuracy feels a bit like comparing apples with oranges. They have different units and scales and probably reflect different neural processes. Is the result the authors find not a little surprising in this context? This relationship does predict performance and is thus intriguing, but I think this methodological aspect needs to be discussed further. For example, is the phase relationship with behaviour a result of a complex interaction between different levels of processing (fundamental contrast vs higher order emotional processing)?

      Traditionally, the SSVEP amplitude at the distractor frequency is used to quantify distractor processing. Given that the target SSVEP amplitude is stronger than that for the distractor, it is possible that the distractor SSVEP amplitude is contaminated by the target SSVEP amplitude due to spectral power leakage; see Figure S4 for a demonstration of this. Because of this issue we therefore introduce the use of decoding accuracy as an index of distractor processing. This has not been done in the SSVEP literature. The lack of correlation between the distractor SSVEP amplitude and the distractor decoding accuracy, although it is kind of like comparing apples with oranges as pointed out by the reviewer, serves the purpose of showing that these two measures are not co-varying, and the use of decoding accuracy is free from the influence of the distractor SSVEP amplitude and thereby free from the influence by the target SSVEP amplitude. This is an important point. We will provide a more thorough discussion of this point in the revised manuscript. 

      Reviewer 2:

      (1) Incomplete Evidence for Rhythmicity at 1 Hz: The central claim of 1 Hz rhythmic sampling is insufficiently validated. The windowing procedure (0.5s windows with 0.25s step) inherently restricts frequency resolution, potentially biasing toward low-frequency components like 1 Hz. Testing different window durations or providing controls would significantly strengthen this claim.

      This is an important point. We plan to follow the reviewer’s suggestion and repeat our analysis using different window sizes to test the robustness of the observed 1Hz rhythmicity. In addition, we plan to also apply the Hilbert transform to extract time-point-by-time-point amplitude envelopes, which will provide a window-free estimation of the distractor strength and further validate the presence of the low-frequency 1Hz dynamics.

      (2) No-Distractor Control Condition: The study lacks a baseline or control condition without distractors. This makes it difficult to determine whether the distractor-related decoding signals or the 1 Hz effect reflect genuine distractor processing or more general task dynamics.

      We agree with the reviewer. This is certainly a limitation and will be acknowledged as such in the revised manuscript.

      (3) Decoding Near Chance Levels: The pairwise decoding accuracies for distractor categories hover close to chance (~55%), raising concerns about robustness. While statistically above chance, the small effect sizes need careful interpretation, particularly when linked to behavior.

      This is a good point. In addition to acknowledging this in the revised manuscript, we will carry out two additional analyses to test this issue further. First, we will implement a random permutation procedure, in which the trial labels are randomly shuffled and the null-hypothesis distribution for decoding accuracy is built, and compare the decoding accuracy from the actual data to this distribution. Second, we will perform a temporal generalization analysis to examine whether the neural representations of the distractor drift over the course of an entire trial, which is 11 seconds long. Recent studies suggest that even when the stimulus stays the same, their neural representations may drift over time.

      (4) No Clear Correlation Between SSVEP and Behavior: Neither target nor distractor signal strength (SSVEP amplitude) correlates with behavioral accuracy. The study instead relies heavily on relative phase, which - while interesting - may benefit from additional converging evidence.

      We felt that what the reviewer pointed out is actually the main point of our study, namely, it is not the overall target or distractor strength that matters for behavior, it is their temporal relationship that matters for behavior. This reveals a novel neuroscience principle that has not been reported in the past. We will stress this point further in the revised manuscript.

      (5) Phase-analysis: phase analysis is performed between different types of signals hindering their interpretability (time-resolved SSVEP amplitude and time-resolved decoding accuracy).

      The time-resolved SSVEP amplitude is used to index the temporal dynamics of target processing whereas the time-resolved decoding accuracy is used to index the temporal dynamics of distractor processing. As such, they can be compared, using relative phase for example, to examine how temporal relations between the two types of processes impact behavior. This said, we do recognize the reviewer’s concern that these two processes are indexed by two different types of signals. We plan to normalize each time course, make them dimensionless, and then compute the temporal relations between them.   

      Appraisal of Aims and Conclusions:

      The authors largely achieved their stated goal of assessing rhythmic sampling of distractors. However, the conclusions drawn - particularly regarding the presence of 1 Hz rhythmicity - rest on analytical choices that should be scrutinized further. While the observed phase-performance relationship is interesting and potentially impactful, the lack of stronger and convergent evidence on the frequency component itself reduces confidence in the broader conclusions.

      Impact and Utility to the Field:

      If validated, the findings will advance our understanding of attentional dynamics and competition in complex visual environments. Demonstrating that ignored distractors can be rhythmically sampled at similar frequencies to targets has implications for models of attention and cognitive control. However, the methodological limitations currently constrain the paper's impact.

      Thanks for these comments and positive assessment of our work’s potential implications and impact. We will try our best in the revision process to address the concerns.

      Additional Context and Considerations:

      (1) The use of EEG-fMRI is mentioned but not leveraged. If BOLD data were collected, even exploratory fMRI analyses (e.g., distractor modulation in visual cortex) could provide valuable converging evidence.

      Indeed, leveraging fMRI data in EEG studies would be very beneficial, as having been demonstrated in our previous work. However, given that this study concerns the temporal relationship between target and distractor processing, it is felt that fMRI, given its well-known limitation in temporal resolution, has limited potential to contribute. We will be exploring this rich dataset in other ways where the two modalities are integrated to gain more insights not possible with either modality used alone.

      (2) In turn, removal of fMRI artifacts might introduce biases or alter the data. For instance, the authors might consider investigating potential fMRI artifact harmonics around 1 Hz to address concerns regarding induced spectral components.

      We have done extensive work in the area of simultaneous EEG-fMRI and have not encountered artifacts with a 1Hz rhythmicity. Also, the fact that the temporal relations between target processing and distractor processing at 1Hz predict behavior is another indication that the 1Hz rhythmicity is a neuroscientific effect not an artifact. However, we will be looking into this carefully and address this in the revision process.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This computational modeling study builds on multiple previous lines of experimental and theoretical research to investigate how a single neuron can solve a nonlinear pattern classification task. The authors construct a detailed biophysical and morphological model of a single striatal medium spiny neuron, and endow excitatory and inhibitory synapses with dynamic synaptic plasticity mechanisms that are sensitive to (1) the presence or absence of a dopamine reward signal, and (2) spatiotemporal coincidence of synaptic activity in single dendritic branches. The latter coincidence is detected by voltage-dependent NMDA-type glutamate receptors, which can generate a type of dendritic spike referred to as a "plateau potential." The proposed mechanisms result in moderate performance on a nonlinear classification task when specific input features are segregated and clustered onto individual branches, but reduced performance when input features are randomly distributed across branches. Given the high level of complexity of all components of the model, it is not clear which features of which components are most important for its performance. There is also room for improvement in the narrative structure of the manuscript and the organization of concepts and data.

      Strengths:

      The integrative aspect of this study is its major strength. It is challenging to relate low-level details such as electrical spine compartmentalization, extrasynaptic neurotransmitter concentrations, dendritic nonlinearities, spatial clustering of correlated inputs, and plasticity of excitatory and inhibitory synapses to high-level computations such as nonlinear feature classification. Due to high simulation costs, it is rare to see highly biophysical and morphological models used for learning studies that require repeated stimulus presentations over the course of a training procedure. The study aspires to prove the principle that experimentally-supported biological mechanisms can explain complex learning.

      Weaknesses:

      The high level of complexity of each component of the model makes it difficult to gain an intuition for which aspects of the model are essential for its performance, or responsible for its poor performance under certain conditions. Stripping down some of the biophysical detail and comparing it to a simpler model may help better understand each component in isolation. That said, the fundamental concepts behind nonlinear feature binding in neurons with compartmentalized dendrites have been explored in previous work, so it is not clear how this study represents a significant conceptual advance. Finally, the presentation of the model, the motivation and justification of each design choice, and the interpretation of each result could be restructured for clarity to be better received by a wider audience.

      Thank you for the feedback! We agree that the complexity of our model can make it challenging to intuitively understand the underlying mechanisms. To address this, we have revised the manuscript to include additional simulations and clearer explanations of the mechanisms at play.

      In the revised introduction, we now explicitly state our primary aim: to assess to what extent a biophysically detailed neuron model can support the theory proposed by Tran-Van-Minh et al. and explore whether such computations can be learned by a single neuron, specifically a projection neuron in the striatum. To achieve this, we focus on several key mechanisms:

      (1) A local learning rule: We develop a learning rule driven by local calcium dynamics in the synapse and by reward signals from the neuromodulator dopamine. This plasticity rule is based on the known synaptic machinery for triggering LTP or LTD in the corticostriatal synapse onto dSPNs (Shen et al., 2008). Importantly, the rule does not rely on supervised learning paradigms and neither is a separate training and testing phase needed.

      (2) Robust dendritic nonlinearities: According to Tran-Van-Minh et al., (2015) sufficient supralinear integration is needed to ensure that e.g. two inputs (i.e. one feature combination in the NFBP, Figure 1A) on the same dendrite generate greater somatic depolarization than if those inputs were distributed across different dendrites. To accomplish this we generate sufficiently robust dendritic plateau potentials using the approach in Trpevski et al., (2023). 

      (3) Metaplasticity: Although not discussed much in more theoretical work, our study demonstrates the necessity of metaplasticity for achieving stable and physiologically realistic synaptic weights. This mechanism ensures that synaptic strengths remain within biologically plausible ranges during training, regardless of initial synaptic weights.

      We have also clarified our design choices and the rationale behind them, as well as restructured the interpretation of our results for greater accessibility. We hope these revisions make our approach and findings more transparent and easier to engage with for a broader audience.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      This study extends three previous lines of work:  

      (1) Prior computational/phenomenological work has shown that the presence of dendritic nonlinearities can enable single neurons to perform linearly non-separable tasks like XOR and feature binding (e.g. Tran-Van-Minh et al., Front. Cell. Neurosci., 2015).

      Prior computational and phenomenological work, such as Tran-Van-Minh et al. (Front. Cell. Neurosci., 2015), directly inspired our study, as we now explicitly state in the introduction (page 4, lines 19-22). While Tran-Van-Minh theoretically demonstrated that these principles could solve the NFBP, it remains untested to what extent this can be achieved quantitatively in biophysically detailed neuron models using biologically plausible learning rules - which is what we test here.

      (2) This study and a previous biophysical modeling study (Trpevski et al., Front. Cell. Neurosci., 2023) rely heavily on the finding from Chalifoux & Carter, J. Neurosci., 2011 that blocking glutamate transporters with TBOA increases dendritic calcium signals. The proposed model thus depends on a specific biophysical mechanism for dendritic plateau potential generation, where spatiotemporally clustered inputs must be co-activated on a single branch, and the voltage compartmentalization of the branch and the voltage-dependence of NMDARs is not enough, but additionally glutamate spillover from neighboring synapses must activate extrasynaptic NMDARs. If this specific biophysical implementation of dendritic plateau potentials is essential to the findings in this study, the authors have not made that connection clear. If it is a simple threshold nonlinearity in dendrites that is important for the model, and not the specific underlying biophysical mechanisms, then the study does not appear to provide a conceptual advance over previous studies demonstrating nonlinear feature binding with simpler implementations of dendritic nonlinearities.

      We appreciate the feedback on the hypothesized role of glutamate spillover in our model. While the current manuscript and Trpevski et al. (2023) emphasize glutamate spillover as a plausible biophysical mechanism to provide sufficiently robust and supralinear plateau potentials, we acknowledge, however, that the mechanisms of supralinearity of dendritic integration, might not depend solely on this specific mechanism in other types of neurons. In Trpevski et al (2023) we, however, realized that if we allow too ‘graded’ dendritic plateaus, using the quite shallow Mg-block reported in experiments, it was difficult to solve the NFBP. The conceptual advance of our study lies in demonstrating that sufficiently nonlinear dendritic integration is needed and that this can be accounted for by assuming spillover in SPNs—but regardless of its biophysical source (e.g. NMDA spillover, steeper NMDA Mg block activation curves or other voltage dependent conductances that cause supralinear dendritic integration)—it enables biophysically detailed neurons to solve the nonlinear feature binding problem. To address this point and clarify the generality of our conclusions, we have revised the relevant sections in the manuscript to state this explicitly.

      (3) Prior work has utilized "sliding-threshold," BCM-like plasticity rules to achieve neuronal selectivity and stability in synaptic weights. Other work has shown coordinated excitatory and inhibitory plasticity. The current manuscript combines "metaplasticity" at excitatory synapses with suppression of inhibitory strength onto strongly activated branches. This resembles the lateral inhibition scheme proposed by Olshausen (Christopher J. Rozell, Don H. Johnson, Richard G. Baraniuk, Bruno A. Olshausen; Sparse Coding via Thresholding and Local Competition in Neural Circuits. Neural Comput 2008; 20 (10): 2526-2563. doi: https://doi.org/10.1162/neco.2008.03-07-486). However, the complexity of the biophysical model makes it difficult to evaluate the relative importance of the additional complexity of the learning scheme.

      We initially tried solving the NFBP with only excitatory plasticity, which worked reasonably well, especially if we assume a small population of neurons collaborates under physiological conditions. However, we observed that plateau potentials from distally located inputs were less effective, and we now explain this limitation in the revised manuscript (page 14, lines 23-37).

      To address this, we added inhibitory plasticity inspired by mechanisms discussed in Castillo et al. (2011) , Ravasenga et al., and Chapman et al. (2022) , as now explicitly stated in the text (page 32, lines 23-26). While our GABA plasticity rule is speculative, it demonstrates that distal GABAergic plasticity can enhance nonlinear computations. These results are particularly encouraging, as it shows that implementing these mechanisms at the single-neuron level produces behavior consistent with network-level models like BCM-like plasticity rules and those proposed by Rozell et al. We hope this will inspire further experimental work on inhibitory plasticity mechanisms.

      P2, paragraph 2: Grammar: "multiple dendritic regions, preferentially responsive to different input values or features, are known to form with close dendritic proximity." The meaning is not clear. "Dendritic regions" do not "form with close dendritic proximity."

      Rewritten (current page 2, line 35)

      P5, paragraph 3: Grammar: I think you mean "strengthened synapses" not "synapses strengthened".

      Rewritten (current page 14, line 36)

      P8, paragraph 1: Grammar: "equally often" not "equally much".

      Updated (current page 10, line 2)

      P8, paragraph 2: "This is because of the learning rule that successively slides the LTP NMDA Ca-dependent plasticity kernel over training." It is not clear what is meant by "sliding," either here or in the Methods. Please clarify.

      We have updated the text and removed the word “sliding” throughout the manuscript to clarify that the calcium dependence of the kernels are in fact updated

      P10, Figure 3C (left): After reading the accompanying text on P8, para 2, I am left not understanding what makes the difference between the two groups of synapses that both encode "yellow," on the same dendritic branch (d1) (so both see the same plateau potentials and dopamine) but one potentiates and one depresses. Please clarify.

      Some "yellow" and "banana" synapses are initialized with weak conductances, limiting their ability to learn due to the relatively slow dynamics of the LTP kernel. These weak synapses fail to reach the calcium thresholds necessary for potentiation during a dopamine peak, yet they remain susceptible to depression under LTD conditions. Initially, the dynamics of the LTP kernel does not allow significant potentiation, even in the presence of appropriate signals such as plateau potentials and dopamine (page 10, lines 22–26). We have added a more detailed explanation of how the learning rule operates in the section “Characterization of the Synaptic Plasticity Rule” on page 9 and have clarified the specific reason why the weaker yellow synapses undergo LTD (page 11, lines 1–7).

      As shown in Supplementary Figure 6, during subthreshold learning, the initial conductance is also low, which similarly hinders the synapses' ability to potentiate. However, with sufficient dopamine, the LTP kernel adapts by shifting closer to the observed calcium levels, allowing these synapses to eventually strengthen. This dynamic highlights how the model enables initially weak synapses to "catch up" under consistent activation and favorable dopaminergic conditions.

      P9, paragraph 1: The phrase "the metaplasticity kernel" is introduced here without prior explanation or motivation for including this level of complexity in the model. Please set it up before you use it.

      A sentence introducing metaplasticity has been added to the introduction (page 3, lines 36-42) as well as on page 9, where the kernel is introduced (page 9, lines 26-35)

      P10, Figure 3D: "kernel midline" is not explained.

      We have replotted fig 3 to make it easier to understand what is shown. Also, an explanation of the Kernel midpoint is added to the legend (current page 12, line 19)

      P11, paragraph 1; P13, Fig. 4C: My interpretation of these data is that clustered connectivity with specific branches is essential for the performance of the model. Randomly distributing input features onto branches (allowing all 4 features to innervate single branches) results in poor performance. This is bad, right? The model can't learn unless a specific pre-wiring is assumed. There is not much interpretation provided at this stage of the manuscript, just a flat description of the result. Tell the reader what you think the implications of this are here.

      Thanks for the suggestion - we have updated this section of the manuscript, adding an interpretation of the results that the model often fails to learn both relevant stimuli if all four features are clustered onto the same dendrite (page 13, lines 31-42). 

      In summary, when multiple feature combinations are encoded in the same dendrite with similar conductances, the ability to determine which combination to store depends on the dynamics of the other dendrite. Small variations in conductance, training order, or other stochastic factors can influence the outcome. This challenge, known as the symmetry-breaking problem, has been previously acknowledged in abstract neuron models (Legenstein and Maass, 2011). To address this, additional mechanisms such as branch plasticity—amplifying or attenuating the plateau potential as it propagates from the dendrite to the soma—can be employed (Legenstein and Maass, 2011). 

      P12, paragraph 2; P13, Figure 4E: This result seems suboptimal, that only synapses at a very specific distance from the soma can be used to effectively learn to solve a NFBP. It is not clear to what extent details of the biophysical and morphological model are contributing to this narrow distance-dependence, or whether it matches physiological data.

      We have added Figure 5—figure supplement 1A to clarify why distal synapses may not optimally contribute to learning. This figure illustrates how inhibitory plasticity improves performance by reducing excessive LTD at distal dendrites, thereby enhancing stimulus discrimination. Relevant explanations have been integrated into Page 18, Lines 25-39 in the revised manuscript.

      P14, paragraph 2: Now the authors are assuming that inhibitory synapses are highly tuned to stimulus features. The tuning of inhibitory cells in the hippocampus and cortex is controversial but seems generally weaker than excitatory cells, commensurate with their reduced number relative to excitatory cells. The model has accumulated a lot of assumptions at this point, many without strong experimental support, which again might make more sense when proposing a new theory, but this stitching together of complex mechanisms does not provide a strong intuition for whether the scheme is either biologically plausible or performant for a general class of problem.

      We acknowledge that it is not currently known whether inhibitory synapses in the striatum are tuned to stimulus features. However, given that the striatum is a purely inhibitory structure, it is plausible that lateral inhibition from other projection neurons could be tuned to features, even if feedforward inhibition from interneurons is not. Therefore, we believe this assumption is reasonable in the context of our model. As noted earlier, the GABA plasticity rule in our study is speculative. However, we hope that our work will encourage further experimental investigations, as we demonstrate that if GABAergic inputs are sufficiently specific, they can significantly enhance computations (This is discussed on page 17, lines 8-15.).

      P16, Figure 5E legend: The explanation of the meaning of T_max and T_min in the legend and text needs clarification.

      The abbreviations  T<sub>min</sub> and  T<sub>max</sub> have been updated to CTL and CTH to better reflect their role in calcium threshold tracking. The Figure 5E legend and relevant text have been revised for clarity. Additionally, the Methods section has been reorganized for better readability.

      P16, Figure 5B, C: When the reader reaches this paper, the conundrums presented in Figure 4 are resolved. The "winner-takes-all" inhibitory plasticity both increases the performance when all features are presented to a single branch and increases the range of somatodendritic distances where synapses can effectively be used for stimulus discrimination. The problem, then, is in the narrative. A lot more setup needs to be provided for the question related to whether or not dendritic nonlinearity and synaptic inhibition can be used to perform the NFBP. The authors may consider consolidating the results of Fig. 4 and 5 so that the comparison is made directly, rather than presenting them serially without much foreshadowing.

      In order to facilitate readability, we have updated the following sections of the manuscript to clarify how inhibitory plasticity resolves challenges from Figure 4:

      Figure 5B and Figure 5–figure supplement 1B: Two new panels illustrate the role of inhibitory plasticity in addressing symmetry problems.

      Figure 5–figure supplement 1A: Shows how inhibitory plasticity extends the effective range of somatodendritic distances.

      P18, Figure 6: This should be the most important figure, finally tying in all the previous complexity to show that NFBP can be partially solved with E and I plasticity even when features are distributed randomly across branches without clustering. However, now bringing in the comparison across spillover models is distracting and not necessary. Just show us the same plateau generation model used throughout the paper, with and without inhibition.

      Figure updated. Accumulative spillover and no-spillover conditions have been removed.

      P18, paragraph 2: "In Fig. 6C, we report that a subset of neurons (5 out of 31) successfully solved the NFBP." This study could be significantly strengthened if this phenomenon could (perhaps in parallel) be shown to occur in a simpler model with a simpler plateau generation mechanism. Furthermore, it could be significantly strengthened if the authors could show that, even if features are randomly distributed at initialization, a pruning mechanism could gradually transition the neuron into the state where fewer features are present on each branch, and the performance could approach the results presented in Figure 5 through dynamic connectivity.

      To model structural plasticity is a good suggestion that should be investigated in later work, however, we feel that it goes beyond what we can do in the current manuscript.  We now acknowledge that structural plasticity might play a role. For example we show that if we can assume ‘branch-specific’ spillover, that leads to sufficiently development of local dendritic non-linearities, also one can learn with distributed inputs. In reality, structural plasticity is likely important here, as we now state (current page 22, line 35-42). 

      P17, paragraph 2: "As shown in Fig. 6B, adding the hypothetical nonlinearities to the model increases the performance towards solving part of the NFBP, i.e. learning to respond to one relevant feature combination only. The performance increases with the amount of nonlinearity." This is not shown in Figure 6B.

      Sentence removed. We have added a Figure 6 - figure supplement 1 to better explain the limitations.

      P22, paragraph 1: The "w" parameter here is used to determine whether spatially localized synapses are co-active enough to generate a plateau potential. However, this is the same w learned through synaptic plasticity. Typically LTP and LTD are thought of as changing the number of postsynaptic AMPARs. Does this "w" also change the AMPAR weight in the model? Do the authors envision this as a presynaptic release probability quantity? If so, please state that and provide experimental justification. If not, please justify modifying the activation of postsynaptic NMDARs through plasticity.

      This is an important remark. Our plasticity model differs from classical LTP models as it depends on the link between LTP and increased spillover as described by Henneberger et al., (2020).

      We have updated the method section (page 27, lines 6-11), and we acknowledge, however, that in a real cell, learning might first strengthen the AMPA component, but after learning the ratio of NMDA/AMPA is unchanged ( Watt et al., 2004). This re-balancing between NMDA and AMPA might perhaps be a slower process.

      Reviewer #2 (Public Review):

      Summary:

      The study explores how single striatal projection neurons (SPNs) utilize dendritic nonlinearities to solve complex integration tasks. It introduces a calcium-based synaptic learning rule that incorporates local calcium dynamics and dopaminergic signals, along with metaplasticity to ensure stability for synaptic weights. Results show SPNs can solve the nonlinear feature binding problem and enhance computational efficiency through inhibitory plasticity in dendrites, emphasizing the significant computational potential of individual neurons. In summary, the study provides a more biologically plausible solution to single-neuron learning and gives further mechanical insights into complex computations at the single-neuron level.

      Strengths:

      The paper introduces a novel learning rule for training a single multicompartmental neuron model to perform nonlinear feature binding tasks (NFBP), highlighting two main strengths: the learning rule is local, calcium-based, and requires only sparse reward signals, making it highly biologically plausible, and it applies to detailed neuron models that effectively preserve dendritic nonlinearities, contrasting with many previous studies that use simplified models.

      Weaknesses:

      I am concerned that the manuscript was submitted too hastily, as evidenced by the quality and logic of the writing and the presentation of the figures. These issues may compromise the integrity of the work. I would recommend a substantial revision of the manuscript to improve the clarity of the writing, incorporate more experiments, and better define the goals of the study.

      Thanks for the valuable feedback. We have now gone through the whole manuscript updating the text, and also improved figures and added some supplementary figures to better explain model mechanisms. In particular, we state more clearly our goal already in the introduction.

      Major Points:

      (1) Quality of Scientific Writing: The current draft does not meet the expected standards. Key issues include:

      i. Mathematical and Implementation Details: The manuscript lacks comprehensive mathematical descriptions and implementation details for the plasticity models (LTP/LTD/Meta) and the SPN model. Given the complexity of the biophysically detailed multicompartment model and the associated learning rules, the inclusion of only nine abstract equations (Eq. 1-9) in the Methods section is insufficient. I was surprised to find no supplementary material providing these crucial details. What parameters were used for the SPN model? What are the mathematical specifics for the extra-synaptic NMDA receptors utilized in this study? For instance, Eq. 3 references [Ca2+]-does this refer to calcium ions influenced by extra-synaptic NMDARs, or does it apply to other standard NMDARs? I also suggest the authors provide pseudocodes for the entire learning process to further clarify the learning rules.

      The model is quite detailed but builds on previous work. For this reason, for model components used in earlier published work (and where models are already available via model repositories, such as ModelDB), we refer the reader to these resources in order to improve readability and to highlight what is novel in this paper - the learning rules itself. The learning rule is now explained in detail. For modelers that want to run the model, we have also provided a GitHub link to the simulation code. We hope this is a reasonable compromise to all readers, i.e, those that only want to understand what is new here (learning rule) and those that also want to test the model code. We explain this to the readers at the beginning of the Methods section.

      ii. Figure quality. The authors seem not to carefully typeset the images, resulting in overcrowding and varying font sizes in the figures. Some of the fonts are too small and hard to read. The text in many of the diagrams is confusing. For example, in Panel A of Figure 3, two flattened images are combined, leading to small, distorted font sizes. In Panels C and D of Figure 7, the inconsistent use of terminology such as "kernels" further complicates the clarity of the presentation. I recommend that the authors thoroughly review all figures and accompanying text to ensure they meet the expected standards of clarity and quality.

      Thanks for directing our attention to these oversights. We have gone through the entire manuscript, updating the figures where needed, and we are making sure that the text and the figure descriptions are clear and adequate and use consistent terminology for all quantities.

      iii. Writing clarity. The manuscript often includes excessive and irrelevant details, particularly in the mathematical discussions. On page 24, within the "Metaplasticity" section, the authors introduce the biological background to support the proposed metaplasticity equation (Eq. 5). However, much of this biological detail is hypothesized rather than experimentally verified. For instance, the claim that "a pause in dopamine triggers a shift towards higher calcium concentrations while a peak in dopamine pushes the LTP kernel in the opposite direction" lacks cited experimental evidence. If evidence exists, it should be clearly referenced; otherwise, these assertions should be presented as theoretical hypotheses. Generally, Eq. 5 and related discussions should be described more concisely, with only a loose connection to dopamine effects until more experimental findings are available.

      The “Metaplasticity” section (pages 30-32) has been updated to be more concise, and the abundant references to dopamine have been removed.

      (2) Goals of the Study: The authors need to clearly define the primary objective of their research. Is it to showcase the computational advantages of the local learning rule, or to elucidate biological functions?

      We have explicitly stated our goal in the introduction (page 4, lines 19-22). Please also see the response to reviewer 1.

      i. Computational Advantage: If the intent is to demonstrate computational advantages, the current experimental results appear inadequate. The learning rule introduced in this work can only solve for four features, whereas previous research (e.g., Bicknell and Hausser, 2021) has shown capability with over 100 features. It is crucial for the authors to extend their demonstrations to prove that their learning rule can handle more than just three features. Furthermore, the requirement to fine-tune the midpoint of the synapse function indicates that the rule modifies the "activation function" of the synapses, as opposed to merely adjusting synaptic weights. In machine learning, modifying weights directly is typically more efficient than altering activation functions during learning tasks. This might account for why the current learning rule is restricted to a limited number of tasks. The authors should critically evaluate whether the proposed local learning rule, including meta-plasticity, actually offers any computational advantage. This evaluation is essential to understand the practical implications and effectiveness of the proposed learning rule.

      Thank you for your feedback. To address the concern regarding feature complexity, we extended our simulations to include learning with 9 and 25 features, achieving accuracies of 80% and 75%, respectively (Figure 6—figure supplement 1A). While our results demonstrate effective performance, the absence of external stabilizers—such as error-modulated functions used in prior studies like Bicknell and Hausser (2021)—means that the model's performance can be more sensitive to occasional incorrect outcomes. For instance, while accuracy might reach 90%, a few errors can significantly affect overall performance due to the lack of mechanisms to stabilize learning.

      In order to clarify the setup of the rule, we have added pseudocode in the revised manuscript (Pages 31-32) detailing how the learning rule and metaplasticity update synaptic weights based on calcium and dopamine signals. Additionally, we have included pseudocode for the inhibitory learning rule on Pages 34-35. In future work, we also aim to incorporate biologically plausible mechanisms, such as dopamine desensitization, to enhance stability.

      ii. Biological Significance: If the goal is to interpret biological functions, the authors should dig deeper into the model behaviors to uncover their biological significance. This exploration should aim to link the observed computational features of the model more directly with biological mechanisms and outcomes.

      As now clearly stated in the introduction, the goal of the study is to see whether and to what quantitative extent the theoretical solution of the NFBP proposed in Tran-Van-Minh et al. (2015) can be achieved with biophysically detailed neuron models and with a biologically inspired learning rule. The problem has so far been solved with abstract and phenomenological neuron models (Schiess et al., 2014; Legenstein and Maass, 2011) and also with a detailed neuron model but with a precalculated voltage-dependent learning rule (Bicknell and Häusser, 2021).

      We have also tried to better explain the model mechanisms by adding supplementary figures.

      Reviewer #2 (Recommendations For The Authors):

      Minor:

      (1) The [Ca]NMDA in Figure 2A and 2C can have large values even when very few synapses are activated. Why is that? Is this setting biologically realistic?

      The elevated [Ca²⁺]NMDA with minimal synaptic activation arises from high spine input resistance, small spine volume, and NMDA receptor conductance, which scales calcium influx with synaptic strength. Physiological studies report spine calcium transients typically up to ~1 μM (Franks and Sejnowski 2002, DOI: 10.1002/bies.10193), while our model shows ~7 μM for 0.625 nS and around ~3 μM for 0.5 nS, exceeding this range. The calcium levels of the model might therefore be somewhat high compared to biologically measured levels - however, this does not impact the learning rule, as the functional dynamics of the rule remain robust across calcium variations.

      (2) In the distributed synapses session, the study introduces two new mechanisms "Threshold spillover" and "Accumulative spillover". Both mechanisms are not basic concepts but quantitative descriptions of them are missing.

      Thank you for your feedback. Based on the recommendations from Reviewer 1, we have simplified the paper by removing the "Accumulative spillover" and focusing solely on the "Thresholded spillover" mechanism. In the updated version of the paper, we refer to it only as glutamate spillover. However, we acknowledge (page 22, lines 40-42) that to create sufficient non-linearities, other mechanisms, like structural plasticity, might also be involved (although testing this in the model will have to be postponed to future work).

      (3) The learning rule achieves moderate performance when feature-relevant synapses are organized in pre-designed clusters, but for more general distributed synaptic inputs, the model fails to faithfully solve the simple task (with its performance of ~ 75%). Performance results indicate the learning rule proposed, despite its delicate design, is still inefficient when the spatial distribution of synapses grows complex, which is often the case on biological neurons. Moreover, this inefficiency is not carefully analyzed in this paper (e.g. why the performance drops significantly and the possible computation mechanism underlying it).

      The drop in performance when using distributed inputs (to a mean performance of 80%) is similar to the mean performance in the same situation in Bicknell and Hausser (2021), see their Fig. 3C. The drop in performance is due to that: i) the relevant feature combinations are not often colocalized on the same dendrite so that they can be strengthened together, and ii) even if they are, there may not be enough synapses to trigger the supralinear response from the branch spillover mechanism, i.e. the inputs are not summated in a supralinear way (Fig. 6B, most input configurations only reach 75%).

      Because of this, at most one relevant feature combination can be learned. In the several cases when the random distribution of synapses is favorable for both relevant feature combinations to be learned, the NFBP is solved (Figs. 6B, some performance lines reach 100 % and 6C, example of such a case). We have extended the relevant sections of the paper trying to highlight the above mentioned mechanisms.

      Further, the theoretical results in Tran-Van-Minh et al. 2015 already show that to solve the NFBP with supralinear dendrites requires features to be pre-clustered in order to evoke the supralinear dendritic response, which would activate the soma. The same number of synapses distributed across the dendrites i) would not excite the soma as strongly, and ii) would summate in the soma as in a point neuron, i.e. no supralinear events can be activated, which are necessary to solve the NFBP. Hence, one doesn’t expect distributed synaptic inputs to solve the NFBP with any kind of learning rule. 

      (4) Figure 5B demonstrates that on average adding inhibitory synapses can enhance the learning capabilities to solve the NFBP for different pattern configurations (2, 3, or 4 features), but since the performance for excitatory-only setup varies greatly between different configurations (Figure 4B, using 2 or 3 features can solve while 4 cannot), can the results be more precise about whether adding inhibitory synapses can help improve the learning with 4 features?

      In response to the question, we added a panel to Figure 5B showing that without inhibitory synapses, 5 out of 13 configurations with four features successfully learn, while with inhibitory synapses, this improves to 7 out of 13. Figure 5—figure supplement 1B provides an explanation for this improvement: page 18 line 10-24

      (5) Also, in terms of the possible role of inhibitory plasticity in learning, as only on-site inhibition is studied here, can other types of inhibition be considered, like on-path or off-path? Do they have similar or different effects?

      This is an interesting suggestion for future work. We observed relevant dynamics in Figure 6A, where inhibitory synapses increased their weights on-site when randomly distributed. Previous work by Gidon and Segev (2012) examined the effects of different inhibitory types on NMDA clusters, highlighting the role of on-site and off-path inhibition in shunting. In our context, on-site inhibition in the same branch, appears more relevant for maintaining compartmentalized dendritic processing.

      (6) Figure 6A is mentioned in the context of excitatory-only setup, but it depicts the setup when both excitatory and inhibitory synapses are included, which is discussed later in the paper. A correction should be made to ensure consistency.

      We have updated the figure and the text in order to make it more clear that simulations are run both with and without inhibition in this context (page 21 line 4-13)

      (7) In the "Ca and kernel dynamics" plots (Fig 3,5), some of the kernel midlines (solid line) are overlapped by dots, e.g. the yellow line in Fig 3D, and some kernel midlines look like dots, which leads to confusion. Suggest to separate plots of Ca and kernel dynamics for clarity. 

      The design of the figures has been updated to improve the visibility of the calcium and kernel dynamics during training.

      (8) The formulations of the learning rule are not well-organized, and the naming of parameters is kind of confusing, e.g. T_min, T_max, which by default represent time, means "Ca concentration threshold" here.

      The abbreviations of the thresholds  ( T<sub>min</sub>,  T<sub>max</sub> in the initial version) have been updated to CTL and CTH, respectively, to better reflect their role in tracking calcium levels. The mathematical formulations have further been reorganized for better readability. The revised Methods section now follows a more structured flow, first explaining the learning mechanisms, followed by the equations and their dependencies.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      In this manuscript, the authors use a large dataset of neuroscience publications to elucidate the nature of self-citation within the neuroscience literature. The authors initially present descriptive measures of self-citation across time and author characteristics; they then produce an inclusive model to tease apart the potential role of various article and author features in shaping self-citation behavior. This is a valuable area of study, and the authors approach it with a rich dataset and solid methodology.

      The revisions made by the authors in this version have greatly improved the validity and clarity of the statistical techniques, and as a result the paper's findings are more convincing.

      This paper's primary strengths are: 1) its comprehensive dataset that allows for a snapshot of the dynamics of several related fields; 2) its thorough exploration of how self-citation behavior relates to characteristics of research and researchers.

      Thank you for your positive view of our paper and for your previous comments.

      Its primary weakness is that the study stops short of digging into potential mechanisms in areas where it is potentially feasible to do so - for example, studying international dynamics by identifying and studying researchers who move between countries, or quantifying more or less 'appropriate' self-citations via measures of abstract text similarity.

      We agree that these are limitations of the existing study. We updated the limitations section as follows (page 15, line 539):

      “Similarly, this study falls short in several potential mechanistic insights, such as by investigating citation appropriateness via text similarity or international dynamics in authors who move between countries.”

      Yet while these types of questions were not determined to be in scope for this paper, the study is quite effective at laying the important groundwork for further study of mechanisms and motivations, and will be a highly valuable resource for both scientists within the field and those studying it.

      Reviewer #2 (Public review):

      The study presents valuable findings on self-citation rates in the field of Neuroscience, shedding light on potential strategic manipulation of citation metrics by first authors, regional variations in citation practices across continents, gender differences in early-career self-citation rates, and the influence of research specialization on self-citation rates in different subfields of Neuroscience. While some of the evidence supporting the claims of the authors is solid, some of the analysis seems incomplete and would benefit from more rigorous approaches.

      Thank you for your comments. We have addressed your suggestions presented in the “Recommendations for the authors” section by performing your recommended sensitivity analysis that specifically identifies authors who could be considered neurologists, neuroscientists, and psychiatrists (as opposed to just papers that are published in these fields). Please see the “Recommendations for the authors” section for more details.

      Reviewer #3 (Public review):

      This paper analyses self-citation rates in the field of Neuroscience, comprising in this case, Neurology, Neuroscience and Psychiatry. Based on data from Scopus, the authors identify self-citations, that is, whether references from a paper by some authors cite work that is written by one of the same authors. They separately analyse this in terms of first-author self-citations and last-author self-citations. The analysis is well-executed and the analysis and results are written down clearly. The interpretation of some of the results might prove more challenging. That is, it is not always clear what is being estimated.

      This issue of interpretability was already raised in my review of the previous revision, where I argued that the authors should take a more explicit causal framework. The authors have now revised some of the language in this revision, in order to downplay causal language. Although this is perfectly fine, this misses the broader point, namely that it is not clear what is being estimated. Perhaps it is best to refer to Lundberg et al. (2021) and ask the authors to clarify "What is your Estimand?" In my view, the theoretical estimands the authors are interested in are causal in nature. Perhaps the authors would argue that their estimands are descriptive. In either case, it would be good if the authors could clarify that theoretical estimand.

      Thank you for your comment and for highlighting this insightful paper. After reading this paper, we believe that our theoretical estimand is descriptive in nature. For example, in the abstract of our paper, we state: “This work characterizes self-citation rates in basic, translational, and clinical Neuroscience literature by collating 100,347 articles from 63 journals between the years 2000-2020.” This goal seems consistent with the idea of a descriptive estimand, as we are not interested in any particular intervention or counterfactual at this stage. Instead, we seek to provide a broad characterization of subgroup differences in self-citations such that future work can ask more focused questions with causal estimands.

      Our analysis included subgroup means and generalized additive models, both of which were described as empirical estimands for a theoretical descriptive estimand in Lundberg et al. We added the following text to the paper (page 3, line 112):

      “Throughout this work, we characterized self-citation rates with descriptive, not causal, analyses. Our analyses included several theoretical estimands that are descriptive 17, such as the mean self-citation rates among published articles as a function of field, year, seniority, country, and gender. We adopted two forms of empirical estimands. First, we showed subgroup means in self-citation rates. We then developed smooth curves with generalized additive models (GAMs) to describe trends in self-citation rates across several variables.”

      In addition, we added to the limitations section as follows (page 15, line 539):

      “Yet, this study may lay the groundwork for future works to explore causal estimands.”

      Finally, in my previous review, I raised the issue of when self-citations become "problematic". The authors have addressed this issue satisfactorily, I believe, and now formulate their conclusions more carefully.

      Thank you for your previous comments. We agree that they improved the paper.

      Lundberg, I., Johnson, R., & Stewart, B. M. (2021). What Is Your Estimand? Defining the Target Quantity Connects Statistical Evidence to Theory. American Sociological Review, 86(3), 532-565. https://doi.org/10.1177/00031224211004187

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Thank you for your thorough revisions and responses to the reviews

      Reviewer #2 (Recommendations for the authors):

      I appreciate the authors' responses and am satisfied with all their replies except for my second comment. I still find the message conveyed slightly misleading, as the results seem to be generalized to neurologists, neuroscientists, and psychiatrists. It is important to refine the analysis to focus specifically on neuroscientists, identified as first or last authors based on their publication history. This approach is common in the science of science literature and would provide a more accurate representation of the findings specific to neuroscientists, avoiding the conflation with other related fields. This refinement could serve as a robustness check in the supplementary. I think adding this sub-analysis is essential to the validity of the results claimed in this paper.

      Thank you for your comment. We added a sensitivity analysis where fields are defined by an author’s publication history, not by the journal of each article.

      In the main text, we added the following:

      (Page 3, line 129) “When determining fields by each author’s publication history instead of the journal of each article, we observed similar rates of self-citation (Table S7). The 95% confidence intervals for each field definition overlapped in most cases, except for Last Author self-citation rates in Neuroscience (7.54% defined by journal vs. 8.32% defined by author) and Psychiatry (8.41% defined by journal vs. 7.92% defined by author).”

      Further details are provided in the methods section (page 21, line 801):

      “4.11 Journal-based vs. author-based field sensitivity analyses

      We refined our field-based analysis to focus only on authors who could be considered neuroscientists, neurologists, and psychiatrists. For each author, we looked at the number of articles they had in each subfield, as defined by Scopus. We considered 12 subfields that fell within Neurology, Neuroscience, and Psychiatry. These subfields are presented in Table S12. For each First Author and Last Author, we excluded them if any of their three most frequently published subfields did not include one of the 12 subfields of interest. If an author’s top three subfields included multiple broader fields (e.g., both Neuroscience and Psychiatry), then that author was categorized according to the field in which they published the most articles. Among First Authors, there were 86,220 remaining papers, split between 33,054 (38.33%) in Neurology, 23,216 (26.93%) in Neuroscience, and 29,950 (34.73%) in Psychiatry. Among Last Authors, there were 85,954 remaining papers, split between 31,793 (36.98%) in Neurology, 25,438 (29.59%) in Neuroscience, and 28,723 (33.42%) in Psychiatry.”

      Reviewer #3 (Recommendations for the authors):

      I would like to thank the authors for their responses the points that I raised, I do not have any new comments or further responses.

    1. Author response:

      We appreciate that the reviewers recognize the conceptual novelty of our work and find our work interesting.

      Reviewer #1:

      We thank Reviewer #1 for making us aware that the image presentation of some of what we see as very clear phenotypes in our work might not have been optimal in the reviewed pdf file, presumably due to the relatively low resolution and lack of appropriately magnified images in the merged pdf file. This issue– if not caught and corrected now– might have caused future readers to similarly not appreciate these clear phenotypes. We will carefully revise the figures and ensure maintenance of appropriate pdf resolution in the merged file so that image presentation is optimal and our findings are appropriately represented.

      We appreciate that Reviewer #1 carefully and critically assessed the growth cone transcriptomic data. We agree that future additional validation is warranted, and this will be clearly stated in our revised paper. Because we judge that these data – even in their current form – will be of potential interest to other investigators sooner rather than later, we respectfully offer and request that we should share them in this paper as our attempt so far to identify elements of the relevant growth cone biology, rather than waiting for years before completing additional validation.

      Even upon repeated reflection, we judge and respectfully submit that our CRISPR in utero electroporation experiments are, indeed, conducted with appropriate controls. We thought through the potential controls deeply prior to completing these complex experiments. We will describe our reasoning in detail in our point-by-point response.

      Reviewer #2:

      We thank Reviewer #2 for encouraging us to elaborate on the direction and cross- repressive interplay between Bcl11a and Bcl11b, which we previously identified (Woodworth*, Greig* et al., Cell Rep, 2016). We omitted deep discussion because we had already published this result, cited that work, and did not want to seem overly self- referential, as well as for reasons of length. Though we know and have reported that Bcl11a and Bcl11b are cross-repressive in SCPN development, we currently do not know whether increased Bcl11a expression in Bcl11b-null SCPN contributes to reduced Cdh13 expression. Also, we do not know if there is a similar Bcl11a-Bcl11b cross repression in striatal medium spiny neurons. This will be clarified in our revised paper.

      We agree fully with the reviewer that “the common practice of picking from a list of differentially expressed genes the most likely ones” has been useful for and has substantially contributed to the elucidation of molecular mechanisms in many systems, including in CNS development. Indeed, the current paper identifies Cdh13 as a newly recognized functional molecule in SCPN axon development by in part using this approach. Cdh13 belongs to a well-known gene family, and its expression by SCPN was already reported by us (Arlotta*, Molyneauz* et al., Neuron, 2005). Despite these two facts, we newly identify its function in SCPN development, which has never been investigated or reported. We appreciate the reviewer encouraging us to elaborate on this here.

      Recent technical advancement allows functional screening of a larger list of genes in vivo (Jin et al., Science, 2020; Ramani et al., bioRxiv, 2024; Zheng et al., Cell, 2024). That said, it is still a challenge to specifically access SCPN in vivo and apply such a high-throughput screening assay for axon development. We agree and predict that future work of this type might likely lead to identification of other new and unknown molecular regulators. We respectfully submit that our work reported here will provide useful foundation for many such future studies.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript reports that expression of the E. coli operon topAI/yjhQ/yjhP is controlled by the translation status of a small open reading frame, that authors have discovered and named toiL, located in the leader region of the operon. The authors propose the following model for topAI activation: Under normal conditions, toiL is translated but topAI is not expressed because of Rho-dependent transcription termination within the topAI ORF and because its ribosome binding site and start codon are trapped in an mRNA hairpin. Ribosome stalling at various codons of the toiL ORF, caused by the presence of some ribosome-targeting antibiotics, triggers an mRNA conformational switch which allows translation of topAI and, in addition, activation of the operon's transcription because the presence of translating ribosomes at the topAI ORF blocks Rho from terminating transcription. Even though the model is appealing and several of the experimental data support some aspects of it, several inconsistencies remain to be solved. In addition, even though TopAI was shown to be an inhibitor of topoisomerase I (Yamaguchi & Inouye, 2015, NAR 43:10387), the authors suggest, without offering any experimental support, that, because ribosome-targeting antibiotics act as inducers, expression of the topAI/yjhQ/yjhP operon may confer resistance to these drugs.

      Strengths:

      - There is good experimental support of the transcriptional repression/activation switch aspect of the model, derived from well-designed transcriptional reporters and ChIP-qPCR approaches.

      - There is a clever use of the topAI-lacZ reporter to find the 23S rRNA mutants where expression topAI was upregulated. This eventually led the authors to identify that translation events occurring at toiL are important to regulate the topAI/yjhQ/yjhP operon. Is there any published evidence that ribosomes with the identified mutations translate slowly (decreased fidelity does not necessarily mean slow translation, does it?)?

      G2253 is in helix 80 of the 23S rRNA, which has been proposed to be involved in correct positioning of the tRNA. Mutations in helix 80 have been reported to cause defects in peptidyl transferase center activity, which could reduce the rate of ribosome movement along the mRNA. If ribosomes are sufficiently slowed when translating toiL, this could induce expression of topAI. G1911 and Ψ1917 are in helix 69 of the 23S rRNA, which is involved in forming the inter-subunit bridge, as well as interactions with release factors. Mutations in helix 69 cause a decrease in the processivity of translation, suggesting that the mutations we identified may increase the occupancy of ribosomes within toiL, thereby inducing expression of topAI. We have added text to the Discussion section to include this speculation.

      - Authors incorporate relevant links to the antibiotic-mediated expression regulation of bacterial resistance genes. Authors can also mention the tryptophan-mediated ribosome stalling at the tnaC leader ORF that activates the expression of tryptophan metabolism genes through blockage of Rho-mediated transcriptional attenuation.

      We have added a citation to a recent structural study of ribosomes translating the tnaC uORF. Specifically, we speculate in the Discussion that toiL may have evolved to sense a ribosome-targeting antibiotic, or another ribosome-targeting small molecule such as an amino acid.

      Weaknesses:

      The main weaknesses of the work are related to several experimental results that are not consistent with the model, or related to a lack of data that needs to be included to support the model.

      The following are a few examples:

      - It is surprising that authors do not mention that several published Ribo-seq data from E. coli cells show active translation of toiL (for example Li et al., 2014, Cell 157: 624). Therefore, it is hard to reconcile with the model that starts codon/Shine-Dalgarno mutations in the toiL-lux reporter have no effect on luciferase expression (Figure 2C, bar graphs of the no antibiotic control samples).

      These data are for a topAI-lux reporter construct rather than toiL-lux. In our model, ribosome stalling within toiL is required to induce expression of the downstream genes; preventing translation of toiL by mutating the start codon or Shine-Dalgarno sequence would not cause ribosome stalling, consistent with the lack of an effect on topAI expression.

      - The SHAPE reactivity data shown in Figure 5A are not consistent with the toiL ORF being translated. In addition, it is difficult to visualize the effect of tetracycline on mRNA conformation with the representation used in Figure 5B. It would be better to show SHAPE reactivity without/with Tet (as shown in panel A of the figure).

      We have modified this figure (now Figure 6) so that we no longer show the SHAPE-seq data +/- tetracycline overlayed on the predicted RNA structure, since at best, the predicted structure likely only represents uninduced state. We have included the predicted structure together with the SHAPE-seq data for untreated cells as a separate panel because it is part of the basis for our model. We have also added a supplementary figure showing a similar RNA structure prediction based on conservation of the topAI upstream region across species (Figure 6 – figure supplement 1), and we describe this in the text.

      - The "increased coverage" of topAI/yjhP/yjhQ in the presence of tetracycline from the Ribo-seq data shown in Figure 6A can be due to activation of translation, transcription, or both. For readers to know which of these possibilities apply, authors need to provide RNA-seq data and show the profiles of the topAI/yjhQ/yjhP genes in control/Tet-treated cells.

      A previous study (Li et al., 2014, PMID 24766808) compared RNA-seq and Ribo-seq data for E. coli to measure normalized ribosome occupancy for each gene. However, sequence coverage for topAI was too low to confidently quantify either the RNA-seq or the Ribo-seq data. Presumably RNA levels were low because of Rho termination. Hence, we were not confident that RNA-seq would provide information on the regulation of topAI-yjhQP. Other data in our study provide strong evidence that regulation is primarily at the level of translation. And the key conclusion from Figure 6 (now Figure 7) is that tetracycline stalls ribosomes on start codons.

      - Similarly, to support the data of increased ribosomal footprints at the toiL start codon in the presence of Tet (Figure 6B), authors should show the profile of the toiL gene from control and Tet-treated cells.

      Figure 6B shows data for both treated and untreated cells. The overall ribosome occupancy is much lower for untreated cells, making it difficult to draw strong conclusions about the relative distribution of ribosomes across toiL.

      - Representation of the mRNA structures in the model shown in Figure 5, does not help with visualizing 1) how ribosomes translate toiL since the ORF is trapped in double-stranded mRNA, and 2) how ribosome stalling on toiL would lead to the release of the initiation region of topAI to achieve expression activation.

      We now show the predicted structure with only SHAPE-seq data for untreated cells. The comparison of SHAPE-seq +/- tetracycline is shown without reference to the predicted structure.

      - The authors speculate that, because ribosome-targeting antibiotics act as expression inducers [by the way, authors should mention and comment that, more than a decade ago, it had been reported that kanamycin (PMID: 12736533) and gentamycin (PMID: 19013277) are inducers of topAI and yjhQ], the genes of the topAI/yjhQ/yjhP operon may confer resistance to these antibiotics. Such a suggestion can be experimentally checked by simply testing whether strains lacking these genes have increased sensitivity to the antibiotic inducers.

      We thank the reviewer for pointing out these references, which we now cite. The fact that another group found that gentamycin induces topAI expression – it is one of the most highly induced genes in that paper – strongly suggests that we missed the key inducing concentrations for one or more antibiotics, meaning that topAI is induced by even more ribosome-targeting antibiotics than we realized.

      We did some preliminary experiments to look for effects of TopAI, YjhQ, and/or YjhP on antibiotic sensitivity, but generated only negative results. Since these experiments were preliminary and far from exhaustive, we have chosen not to include them in the manuscript. Other studies of genes regulated by ribosome stalling in a uORF have looked at genes whose functions in responding to translation stress were already known, so the environmental triggers were more obvious. With so many possible triggers for topAI-yjhQP, it will likely require considerable effort to find the relevant trigger(s). Hence, we consider this an important question, but beyond the scope of this manuscript.

      Reviewer #2 (Public Review):

      Summary:

      In this important study, Baniulyte and Wade describe how the translation of an 8-codon uORF denoted toiL upstream of the topAI-yjhQP operon is responsive to different ribosome-targeting antibiotics, consequently controlling translation of the TopAI toxin as well as Rho-dependent termination with the gene.

      Strengths:

      I appreciate that the authors used multiple different approaches such as a genetic screen to identify factors such as 23S rRNA mutations that affect topA1 expression and ribosome profiling to examine the consequences of various antibiotics on toiL-mediated regulation. The results are convincing and clearly described.

      Weaknesses:

      I have relatively minor suggestions for improving the manuscript. These mainly relate to the figures.

      Reviewer #3 (Public Review):

      Summary:

      The authors nicely show that the translation and ribosome stalling within the ToiL uORF upstream of the co-transcribed topAI-yjhQ toxin-antitoxin genes unmask the topAI translational initiation site, thereby allowing ribosome loading and preventing premature Rho-dependent transcription termination in the topAI region. Although similar translational/transcriptional attenuation has been reported in other systems, the base pairing between the leader sequence and the repressed region by the long RNA looping is somehow unique in toiL-topAI-yjhQP. The experiments are solidly executed, and the manuscript is clear in most parts with areas that could be improved or better explained. The real impact of such a study is not easy to appreciate due to a lack of investigation on the physiological consequences of topAI-yjhQP activation upon antibiotic exposure (see details below).

      Strengths:

      Conclusion/model is supported by the integrated approaches consisting of genetics, in vivo SHAPE-seq and Ribo-Seq.

      Provide an elegant example of cis-acting regulatory peptides to a growing list of functional small proteins in bacterial proteomes.

      Recommendations for the authors:

      Reviewing Editor Comments:

      (1) Examine the consequences of mutations impeding translation of the topAI/yjhQ/yjhP operon on cell growth in the presence and absence of antibiotics.

      See response to Reviewer 1’s comment.

      (2) Resolve discrepancies between the SHAPE data indicating constitutive sequestration of the toiL Shine Dalgarno sequence with antibiotic-regulated translation of the toiL ORF.

      See response to Reviewer 1’s comment.

      (3) Reconcile published Ribo-Seq data with the model that start codon/Shine-Dalgarno mutations in the toiL-lux reporter have no effect on luciferase expression in the absence of antibiotics.

      See response to Reviewer 1’s comment.

      (4) Clarify whether antibiotic MIC values were employed to select antibiotic concentrations for different experiments.

      The antibiotic concentrations we used are in line with reported MICs for E. coli. We now list the reported ECOFFs/MICs and include relevant citations.

      (5) Provide RNA-seq data to complement the Ribo-Seq data for the topAI/yjhQ/yjhP genes in control vs. Tet-treated cells.

      See response to Reviewer 1’s comment.

      (6) Revise the text to address as many of the reviewers' suggestions as reasonably possible.

      Changes to the text have been made as indicated in the responses to the reviewers’ comments.

      Reviewer #2 (Recommendations for the Authors):

      (1) Page 6: I would have liked to have more information about the 39 suppressor mutations in rho. Do any of the cis-acting mutations give support for the model proposed in Figure 8?

      We only know the specific mutation for some of the strains, and we now list those mutations in the Methods section. For other mutants, we mapped the mutation to either the rho gene or to Rho activity, but we did not sequence the rho gene. Most of the specific mutations we did identify fall within the primary RNA-binding site of Rho and hence should be considered partial-loss-of-function mutations (complete loss of function would be lethal).

      We identified cis-acting mutations by re-transforming the lacZ reporter plasmid into a wild-type strain. We did not sequence any of these plasmids.

      (2) Page 12-13, Section entitled "Mapping ribosome stalling sites induced by different antibiotics": This section should start with a better transition regarding the logic of why the experiments were carried out and should end with an interpretation of the results.

      We have added a few sentences at the start of this section to explain the rationale. We have also added two sentences at the end of this section to summarize the interpretation of the data.

      (3) Page 15: The authors should discuss under what conditions the expression of TopAI (and YjhQ/YjhP might be induced? Is expression also elevated upon amino acid starvation?

      We have looked through public RNA-seq data but have not identified growth conditions other than antibiotic treatment that induce expression of topAI, yjhQ or yjhP.

      (4) References: The authors should be consistent about capitalization, italics, and abbreviations in the references.

      These formatting errors will be fixed in the proofing stage.

      (5) All graph figures: There should be more uniformity in the sizes of individual data points (some are almost impossible to see) and error bars across the figures.

      We have tried to make the data points and error bars more visible for figures where they were smaller.

      (6) Figure 1B: I do not think the left arrow labeling is very intuitive and suggest renaming these constructs.

      We have removed the arrows to improve clarity.

      (7) Figure 2A: toiL should be introduced at the first mention of Figure 2A.

      We have added a schematic of the topAI-yjhQ-yjhP region as Figure 1A, including the toiL ORF, which we briefly mention in the text. We have opted to split Figure 2C into two panels. In Figure 2C we now only show data for the wild-type construct. Data for the mutant constructs are now shown in a new figure (Figure 5), alongside data for the wild-type constructs. We have simplified Figure 2A, since the mutations are not relevant to this revised figure, and we now show the schematic with the mutations as Figure 5A.

      (8) Figure 3C and 3D: I suggest giving these graphs headings (or changing the color of the bars in Figure 3D) to make it more obvious that different things are measured in the two panels.

      We have added headers to panels B-D make it clear that which graphs show ChIP-qPCR data which graph shows qRT-PCR data.

      (9) Figure 6: It might be nice to show the topAI-yjhPQ operon here.

      We now show the operon in Figure 1A.

      (10) Figure 8: This figure could be optimized by adding 5' and 3' end labels and having more similarity with the model in Figure 7.

      The constructs shown in Figure 7 lack most of the topAI upstream region, so they aren’t readily comparable to the schematic in Figure 8. However, we have changed the color of the ribosome in Figure 7 to match that in Figure 8. We also indicate the 5’ end of the RNA in Figure 8.

      Reviewer #3 (Recommendations for the Authors):

      Areas to improve:

      (1) While it's important to learn about ToiL-dependent regulation of the downstream topAI-yjhQ toxin-antitoxin genes, the physiological consequence of topAI-yjhQ activation seems to be lost in the manuscript. Everything was done with a reporter lacZ/lux. In the absence of toiL translation (i.e. SD mutant) and/or ribosome stalling, does premature transcription termination result in non-stochiometric synthesis of toxin vs. antitoxin, leading to growth arrest or other measurable phenotype? Knowing the impact of ToiL in the native topAI-yjhQ context will be valuable.

      See response to Reviewer 1’s comment.

      (2) It was indicated in Figure 4-figure supplement 1 that toiL homologs are found in many other proteobacteria, are the UR sequences in those species also form a similar inhibitory RNA loop?? The nt sequence identity of toiL is likely to be constrained by the base pairing of the topAI 5' region.

      We have added a supplementary figure panel showing an RNA structure prediction for the topAI upstream region based on sequence alignment of homologous regions from other species (Figure 6 – figure supplement 1).

      What is the frequency of the MLENVII hepta-peptide in the E. coli genome-wide. Is the sequence disfavored to avoid spurious multi-antibiotic sensing?

      LENVII is not found in any annotated E. coli K-12 protein. However, this is a sufficiently long sequence that we would expect few to no instances in the E. coli proteome.

      (3) Figure 1A, it would be helpful to indicate the location of the toiL (red arrow as in Figure 2A) relative to the putative rut site early in the beginning of the results. Does TSS mark the transcription start site? There is no annotation of TSS in the figure legend. Was TSS previously mapped experimentally? Please include relevant citations.

      We now indicate the position of the TSS relative to the topAI start codon. Similarly, we indicate the position of the start of toiL relative to the topAI start codon in Figure 2A. We now explain “TSS” in the figure legend. There is a reference in the text for the TSS (Thomason et al., 2015).

      (4) Please consider rearranging the results section, perhaps more helpful to introduce the toiL in Figure 1 or earlier. The current format requires readers to switch back-and-forth between Figure 4 and Figure 2.

      We have added a schematic of the topAI upstream region as Figure 1A, and we have separated Figure 2C as described in a response to a comment from Reviewer 2.

      (5) Figure 2A and Figure 2-Figure Suppl 1A, for clarity, please mark the rut site upstream of the red arrow.

      Rather than mark the rut on Figure 2A, which would make for a busy schematic, readers can compare the positions of the rut to those of toiL, which we have now added to Figures 1B (formerly Figure 1A) and 2A.

      (6) The following conclusion seems speculative: "...but does not trigger termination until RNAP ..., >180 nt further downstream…". Shouldn't the authors already know where the termination site is based on their previous Term-seq data (see Ref 1, Adams PP et al 2021)?

      Sites of Rho-dependent transcription termination cannot be mapped precisely from Term-seq data because exoribonucleases rapidly process the unstructured RNA 3’ ends.

      (7) Genetic screen: Please discuss why the 23S rRNA mutations that cause translational infidelity could promote topAI translation. Wouldn't the mutant ribosome be affected in translating toiL?

      See response to Reviewer 1’s comment.

      (8) Although antibiotic concentrations were provided in Figure 2 legend, please provide the MIC values of each antibiotic, e.g., in Table S2, for the tested E. coli strain, to inform readers how specific subinhibitory concentrations were chosen.

      See response to Reviewing Editor.

      (9) Please clarify the calculation of luciferase units in the y-axis of Figure 2A, why the scale is drastically higher than that of Figure 7C using the same antibiotics?

      These reporter assays use different constructs. The reporter construct used for experiments in Figure 7 includes a portion of the ermCL gene and associated downstream sequence. We have enlarged Figure 7A to highlight the difference in reporter constructs.

      (10) Table S4 needs a few more details. It is unclear how those numbers in columns G-H were generated. Do those numbers correspond to ribosome density per nt/ORF?

      We have added footnotes to Table S4 to indicate that the numbers in columns G and H represent sequence read coverage normalized by region length and by the upper quartile of gene expression.

      (11) Figure 5, if the SHAPE results were true, the Shine Dalgarno sequence of toiL is sequestered in the hairpin structure with and without tetracycline treatment. It is inconceivable that translational initiation will occur efficiently, please discuss.

      Our representation of the SHAPE-seq data was confusing since we overlayed the SHAPE-seq changes on a predicted structure that likely corresponds to the uninduced state. We hope that the new version of Figure 5 is clearer.

      We presume the reviewer is referring to the Shine-Dalgarno sequence of topAI rather than toiL, since the Shine-Dalgarno sequence of toiL is predicted to be unstructured even in the absence of tetracycline treatment. The ribosome-binding site of topAI is more accessible in cells treated with tetracycline, although the SHAPE-seq data suggest that this is a transient event. The binding of the initiating ribosome may also reduce reactivity in this region under inducing conditions. We now discuss this briefly in the text.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews: 

      Reviewer #1 (Public review): 

      The manuscript consists of two separate but interlinked investigations: genomic epidemiology and virulence assessment of Salmonella Dublin. ST10 dominates the epidemiological landscape of S. Dublin, while ST74 was uncommonly isolated. Detailed genomic epidemiology of ST10 unfolded the evolutionary history of this common genotype, highlighting clonal expansions linked to each distinct geography. Notably, North American ST10 was associated with more antimicrobial resistance compared to others. The authors also performed long read sequencing on a subset of isolates (ST10 and ST74), and uncovered a novel recombinant virulence plasmid in ST10 (IncX1/IncFII/IncN). Separately, the authors performed cell invasion and cytotoxicity assays on the two S. Dublin genotypes, showing differential responses between the two STs. ST74 replicates better intracellularly in macrophage compared to ST10, but both STs induced comparable cytotoxicity levels. Comparative genomic analyses between the two genotypes showed certain genetic content unique to each genotype, but no further analyses were conducted to investigate which genetic factors likely associated with the observed differences. The study provides a comprehensive and novel understanding on the evolution and adaptation of two S. Dublin genotypes, which can inform public health measures. The methodology included in both approaches were sound and written in sufficient detail, and data analysis were performed with rigour. Source data were fully presented and accessible to readers. 

      Comments on revised version: 

      The authors have addressed all the points raised by the reviewer. The manuscript is now much enhanced in clarity and accuracy. The re-written Discussion is more relevant and brings in comparison with other invasive Salmonella serotypes. 

      Comments: 

      In light of the metadata supplied in this revision, for Australian isolates, all human cases of ST74 (n=7) were from faeces (assuming from gastroenteritis) while 18/40 of ST10 were from invasive specimen (blood and abscess). This may contradict with the manuscript's finding and discussion on different experiment phenotypes of the two STs, with ST74 showing more replication in macrophages and potentially more invasive. Thus, the reviewer suggests the authors to mention this disparity in the Discussion, and discuss possible reasons underlying this disparity. This can strengthen the author's rationale for further in vivo studies. 

      We thank the reviewer for pointing out this important observation. We have amended the text in the Discussion to address the differences in source of human cases as suggested by the Reviewer (lines 392-430). We have also included text highlighting the important knowledge gaps in understanding the drivers for emerging iNTS with broad host ranges and identify future avenues of research that could be explored to better understand the observed differences in the host-pathogen interactions.  

      Reviewer #2 (Public review): 

      This is a comprehensive analysis of Salmonella Dublin genomes that offers insights into the global spread of this pathogen and region-specific traits that are important to understand its evolution. The phenotyping of isolates of ST10 and ST74 also offer insights into the variability that can be seen in S. Dublin, which is also seen in other Salmonella serovars, and reminds the field that it is important to look beyond lab-adapted strains to truly understand these pathogens. This is a valuable contribution to the field. The only limitation, which the authors also acknowledge, is the bias towards S. Dublin genomes from high income settings. However, there is no selection bias; this is simply a consequence of publicly available sequences. 

      We thank the reviewer for their comments and acknowledge the limitations of this study.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      (1) The authors repeatedly assert that an individual's behavior in the foraging assay depends on its prior history (particularly cultivation conditions). While this seems like a reasonable expectation, it is not fully fleshed out. The work would benefit from studies in which animals are raised on more or less abundant food before the behavioral task.

      Cultivation density: While we agree with the reviewer that testing the effects of varying bacterial density during animal development (cultivation) is an interesting experiment, it is not feasible at this time. We previously attempted this experiment but found it nontrivial to maintain stable bacterial density conditions over long timescales as this requires matching the rate of bacterial growth with the rate of bacterial consumption. Despite our best efforts, we have not been able to identify conditions that satisfy these requirements. Thus, we focused our revised manuscript to include only assertions about the effects of recent experiences and added this inquiry as a future direction (lines 618-624).

      (2) The authors convincingly show that the probability of particular behavioral outcomes occurring upon patch encounter depends on time-associated parameters (time since last patch encounter, time since last patch exploitation). There are two concerns here. First, it is not clear how these values are initialized - i.e., what values are used for the first occurrence of each behavioral state? More importantly, the authors don't seem to consider the simplest time parameter, the time since the start of the assay (or time since worm transfer). Transferring animals to a new environment can be associated with significant mechanical stimulus, and it seems quite possible that transferring animals causes them to enter a state of arousal. This arousal, which certainly could alter sensory function or decision-making, would likely decay with time. It would be interesting to know how well the model performs using time since assay starts as the only time-dependent parameter.

      Parameter Initialization: We thank the reviewer for pointing out an oversight in our methods section regarding the model parameter values used for the first encounter. We clarified the initialization of parameters in the manuscript (lines 1162-1179). In short, for the first patch encounter where k = 1:

      ρ<sub>k</sub> is the relative density of the first patch.

      τ<sub>s</sub> is the duration of time spent off food since the beginning of the recorded experiment. For the first patch, this is equivalent to the total time elapsed.

      ρ<sub>h</sub> is the approximated relative density of the bacterial patch on the acclimation plates (see Assay preparation and recording in Methods). Acclimation plates contained one large 200 µL patch seeded with OD<sub>600</sub> = 1 and grown for a total of ~48 hours. As with all patches, the relative density was estimated from experiments using fluorescent bacteria OP50-GFP as described in Bacterial patch density estimation in Methods.

      ρ<sub>e</sub> is equivalent to ρ<sub>h</sub>.

      Transfer Method: We thank the reviewer for their thoughtful comment on how the stress of transferring animals to a new plate may have resulted in an increased arousal state and thus a greater probability of rejecting patches. We anticipated this possibility and, in order to mitigate the stress of moving, we used an agar plug method where animals were transferred using the flat surface of small cylinders of agar. Importantly, the use of agar as a medium to transfer animals provides minimal disruption to their environment as all physical properties (e.g. temperature, humidity, surface tension) are maintained. Qualitatively, we observed no marked change in behavior from before to after transfer with the agar plug method, especially as compared to the often drastic changes observed when using a metal or eyelash pick. We added these additional methodological details to the methods (lines 791-796).

      Time Parameter: However, the reviewer’s concern that the simplest time parameter (time since start of the assay) might better predict animal behavior is valid. We thank the reviewer for pointing out the need to specifically test whether the time-dependent change in explore-exploit decision-making corresponds better with satiety (time off patch) or arousal (time since transfer/start of assay) state. To test this hypothesis, we ran our model with varying combinations of the satiety term τ<sub>s</sub> and a transfer term τ<sub>t</sub>. We found that when both terms were included in the model, the coefficient of the transfer term was non-significant. This result suggests that the relevant time-dependent term is more likely related to satiety than transfer-induced stress (lines 343-358; Figure 4 - supplement 4D).

      (3) Similarly, Figures 2L and M clearly show that the probability of a search event occurring upon a patch encounter decreases markedly with time. Because search events are interpreted as a failure to detect a patch, this implies that the detection of (dilute) patches becomes more efficient with time. It would be useful for the authors to consider this possibility as well as potential explanations, which might be related to the point above.

      Time-dependent changes in sensing: We agree with the reviewer that we observe increased responsiveness to dilute patches with time. Although this is interesting, our primary focus was on what decision an animal made given that they clearly sensed the presence of the bacterial patch. Nonetheless, we added this observation to the discussion as an area of future work to investigate the sensory mechanisms behind this effect (lines 563-568).

      (4) Based on their results with mec-4 and osm-6 mutants, the authors assert that chemosensation, rather than mechanosensation, likely accounts for animals' ability to measure patch density. This argument is not well-supported: mec-4 is required only for the function of the six non-ciliated light-touch neurons (AVM, PVM, ALML/R, PLML/R). In contrast, osm-6 is expected to disrupt the function of the ciliated dopaminergic mechanosensory neurons CEP, ADE, and PDE, which have previously been shown to detect the presence of bacteria (Sawin et al 2000). Thus, the paper's results are entirely consistent with an important role of mechanosensation in detecting bacterial abundance. Along these lines, it would be useful for the authors to speculate on why osm-6 mutants are more, rather than less, likely to "accept" when encountering a patch.

      Sensory mutant behavior: We thank the reviewer for pointing out the error in our interpretation of the behavior of osm-6 and mec-4 animals. We further elaborated on our findings and edited the text to better reflect that osm-6 mutants lack both chemosensory and mechanosensory ciliated sensory neurons (lines 406-448; lines 567-577). Specifically, we provided some commentary on the finding that osm-6 mutants show an augmented ability to detect the presence of bacterial patches but a reduced ability to assess their bacterial density. While this finding seems contradictory, it suggests that in the absence of the ability to assess bacterial density, animals must prioritize exploiting food resources when available.

      (5) While the evidence for the accept-reject framework is strong, it would be useful for the authors to provide a bit more discussion about the null hypothesis and associated expectations. In other words, what would worm behavior in this assay look like if animals were not able to make accept-reject decisions, relying only on exploit-explore decisions that depend on modulation of food-leaving probability?

      Accept-reject vs. stay-switch: We thank the reviewer for alerting us to this gap in our discussion. We have revised the text to further extrapolate upon our point of view on this somewhat philosophical distinction and what it predicts about C. elegans behavior (lines 507-533).

      Reviewer #3 (Public review):

      (1) Sensing vs. non-sensing

      The authors claim that when animals encounter dilute food patches, they do not sense them, as evidenced by the shallow deceleration that occurs when animals encounter these patches. This seems ethologically inaccurate. There is a critical difference between not sensing a stimulus, and not reacting to it. Animals sense numerous stimuli from their environment, but often only behaviorally respond to a fraction of them, depending on their attention and arousal state. With regard to C. elegans, it is well-established that their amphid chemosensory neurons are capable of detecting very dilute concentrations of odors. In addition, the authors provide evidence that osm-6 animals have altered exploit behaviors, further supporting the importance of amphid chemosensory neurons in this behavior.

      Interpretation of “non-sensing” encounters: We thank the reviewer for their comment and agree that we do not know for certain whether the animals sensed these patches or were merely non-responsive to them. We are, however, confident that these encounters lack evidence of sensing. Specifically, we note that our analyses used to classify events as sensing or non-sensing examined whether an animal’s slow-down upon patch entry could be distinguished from either that of events where animals exploited or that of encounters with patches lacking bacteria. We found that  “non-sensing” encounters are indeed indistinguishable from encounters with bacteria-free patches where there are no bacteria to be sensed (see Figure 2 - Supplement 8A-C and Patch encounter classification as sensing or non-responding in Methods). Regardless, we agree with the reviewer that all that can be asserted about these events is that animals do not appear to respond to the bacterial patch in any way that we measured. Therefore, we have replaced the term “non-sensing” with “non-responding” to better indicate the ethological interpretation of these events and clarified the text to reflect this change (lines 193-200; lines 211-212).

      (2) Search vs. sample & sensing vs. non-sensing

      In Figures 2H and 2I, the authors claim that there are three behavioral states based on quantifying average velocity, encounter duration, and acceleration, but I only see three. Based on density distributions alone, there really only seem to be 2 distributions, not 3. The authors claim there are three, but to come to this conclusion, they used a QDA, which inherently is based on the authors training the model to detect three states based on prior annotations. Did the authors perform a model test, such as the Bayesian Information Criterion, to confirm whether 2 vs. 3 Gaussians is statistically significant? It seems like the authors are trying to impose two states on a phenomenon with a broad distribution. This seems very similar to the results observed for roaming vs. dwelling experiments, which again, are essentially two behavioral states.

      Validation of sensing clusters: We are grateful to the reviewer for pointing out the difficulty in visualizing the clusters and the need for additional clarity in explaining the semi-supervised QDA approach. We added additional visualizations and methods to validate the clusters we have discovered. Specifically, we used Silverman’s test to show that the sensing vs. non-responding data were bi-modal (i.e. a two-cluster classification method fits best) and accompanied this statistical test with heat maps which better illustrate the clusters (lines 171-173; lines 190-191; lines 948-972; lines 1003-1005; Figure 2 - supplement 6A-C; Figure 2 - supplement 7C-F).

      Further, it seems that there may be some confusion as to how we arrived at 3 encounter types (i.e. search, sample, exploit). It’s important to note that two methods were used on two different (albeit related) sets of parameters. We first used a two-cluster GMM to classify encounters as explore or exploit. We then used a two-cluster semi-supervised QDA to classify encounters as sensing or non-sensing (now changed to “non-responding”, see above response) using a different set of parameters. We thus separated the explore cluster into two (sensing and non-responding exploratory events) resulting in three total encounter types: exploit, sample (explore/sensing), and search (explore/non-sensing).

      (4) History-dependence of the GLM

      The logistic GLM seems like a logical way to model a binary choice, and I think the parameters you chose are certainly important. However, the framing of them seems odd to me. I do not doubt the animals are assessing the current state of the patch with an assessment of past experience; that makes perfect logical sense. However, it seems odd to reduce past experience to the categories of recently exploited patch, recently encountered patch, and time since last exploitation. This implies the animals have some way of discriminating these past patch experiences and committing them to memory. Also, it seems logical that the time on these patches, not just their density, should also matter, just as the time without food matters. Time is inherent to memory. This model also imposes a prior categorization in trying to distinguish between sensed vs. not-sensed patches, which I criticized earlier. Only "sensed" patches are used in the model, but it is questionable whether worms genuinely do not "sense" these patches.

      Model design: We thank the reviewer for their thoughtful comments on the model. We completed a number of analyses involving model selection including model selection criteria (AIC, BIC) and optimization with regularization techniques (LASSO and elastic nets) and found that the problem of model selection was compounded by the enormous array of highly-correlated variables we had to choose from. Additionally, we found that both interaction terms and non-linear terms of our task variables could be predictive of accept-reject decisions but that the precise set of terms selected depended sensitively on which model selection technique was used and generally made rather small contributions to prediction. The diverse array of results and combinatorial number of predictors to possibly include failed to add anything of interpretable value. We therefore chose to take a different approach to this problem. Rather than trying to determine what the “best” model was we instead asked whether a minimal model could be used to answer a set of core questions. Indeed, our goal was not maximal predictive performance but rather to distinguish between the effects of different influences enough to determine if encounter history had a significant, independent effect on decision making. We thus chose to only include task variables that spanned the most basic components of behavioral mechanisms to ask very specific questions. For example, we selected a time variable that we thought best encapsulated satiety. While we could have included many additional terms, or made different choices about which terms to include, based on our analyses these choices would not have qualitatively changed our results. Further, we sought to validate the parameters we chose with additional studies (i.e. food-deprived and sensory mutant animals). We regard our study as an initial foray into demonstrating accept-reject decision-making in nematodes. The exact mechanisms and, consequently, the best model design are therefore beyond the scope of this study.

      Lastly, in regards to the use of only sensed patches in the model; while we acknowledge that we are not certain as to whether the “non-responding” encounters are truly not sensed, we find qualitatively similar results when including all exploratory patches in our analyses. However, we take the position that sensation is necessary for decision-making and thus believe that while our model’s predictive performance may be better using all encounters, the interpretation of our findings is stronger when we only include sensing events. We have added additional commentary about our model to the discussion section (lines 667-695).

      (5) osm-6

      The osm-6 results are interesting. This seems to indicate that the worms are still sensing the food, but are unable to assess quality, therefore the default response is to exploit. How do you think the worms are sensing the food? Clearly, they sense it, but without the amphid sensory neurons, and not mechanosensation. Perhaps feeding is important? Could you speculate on this?

      We thank the reviewer for their thoughtful remarks. We have added additional commentary about the result of our sensory mutant experiments as described above in response to Reviewer #1 under Sensory mutant behavior.

      (7) Impact:

      I think this work will have a solid impact on the field, as it provides tangible variables to test how animals assess their environment and decide to exploit resources. I think the strength of this research could be strengthened by a reassessment of their model that would both simplify it and provide testable timescales of satiety/starvation memory.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The authors title the work as an "ethological study" and emphasize the theme of "foraging in naturalistic environments" in contrast to typical laboratory conditions. The only difference in this study relative to typical laboratory conditions is that the food bacteria is distributed in many small patches as compared to one large patch. First, it is not clear to the reviewer that the size of the food patches in these experiments is more relevant to C. elegans in its natural context than the standard sizes of food patches. Furthermore, all the other highly unnatural conditions typical of laboratory cultivation still apply: the use of a 2D agar substrate, a single food bacteria that is not a component of a naturalistic diet, and the use of a laboratory-adapted strain of C. elegans with behavior quite distinct from that of natural isolates. The reviewer is not suggesting that the authors need to make their experiments more naturalistic, only that the experiments as described here should not be described as naturalistic or ethological as there is no support for such claims.

      Ethological interpretation: We thank the reviewer for their comments about the use of the term ethological to describe this study. We chose to develop a patchy bacterial assay to mimic the naturalistic “boom-or-bust” environment. While we agree with the reviewer that we do not know if the size and distribution of the food patches in these experiments is more relevant to C. elegans, we maintain that these experiments were ecologically-inspired and revealed behavior that is difficult to observe in environments with large, densely-seeded bacterial patches. We have updated our text to better reflect that this study was “ecologically-inspired” rather than truly “ethological” in nature (lines 94, 693).

      The main finding of the paper is that worms explore and then exploit, i.e. they frequently reject several bacterial patches before accepting one. This result requires additional scrutiny to reject other possible interpretations. In particular, when worms are transferred to a new plate we would expect some period of increased arousal due to the stressful handling process. A high arousal state might cause rejection of food patches. Could the measured accept/reject decisions be influenced by this effect? One approach to addressing this concern would be to allow the animals to acclimate to the new plate on a bare region before encountering the new food patches.

      We thank the reviewer for their comment on how the stress of transferring animals to a new plate may have resulted in an increased arousal state and thus a greater probability of rejecting patches. We addressed this above in response to Reviewer #1 under Transfer Method and Time Parameter. In brief, we used a worm picking method that mitigated stress and added additional analyses showing that a transfer-related term was less predictive than a satiety-related term.

      Related to the above, in what circumstances exactly are the authors claiming that worms first explore and then exploit? After being briefly deprived of food? After being handled?

      Explore-then-exploit: All animals were well-fed and handled gently as described above under Transfer Method (lines 787-795). Our results suggest that the appearance of an explore-then-exploit strategy is a byproduct of being transferred from an environment with high bacterial density to an environment with low bacterial density as described in the manuscript (lines 461-466).

      The authors emphasize their analysis of the accept/reject decision as a critical innovation. However, the accept/reject decision does not strike me as substantially different from the previously described stay/switch decision. When a worm encounters a new patch of bacteria, accepting this bacteria is equivalent to staying on it and rejecting (leaving) it is equivalent to switching away from it. The authors should explain how these concepts are significantly distinct.

      Accept-reject vs. stay-switch: We thank the reviewer for alerting us to this gap in our discussion. We have revised the text to further extrapolate upon our point of view on this somewhat philosophical distinction and what it predicts about C. elegans behavior (lines 507-533).

      During patch encounter classification, the authors computed three of the animals' behavioral metrics (Line 801-804) and claimed that the combination of these three metrics reveals two non-Gaussian clusters representing encounters where animals sensed the patch or did not appear to sense the patch. The authors also refer to a video to demonstrate the two clusters by rotating the 3-dimension scatter plot. However, the supposed clusters, if any, are difficult to see in a 3D (Video 5) or in a 2D scatter plot (Figure 3I). The authors need to clearly demonstrate the distinct clustering as claimed in the paper as this feature is fundamental and necessary for the model implementation and interpretation of results.

      We are grateful to the reviewer for pointing out the difficulty in visualizing the clusters. We added additional visualizations and methods to validate the clusters we have discovered as described in our above response to Reviewer #3 under Validation of sensing clusters.

      When selecting parameters (covariates) for their model, it is critical to avoid overfitting. Therefore, the authors used AIC and BIC (Figure 4- supplement 1) to demonstrate that the full GLM model has a better model performance than the other models which contain only a subset of the full covariates (in a total of 5). However, the authors compare the full set with only 4 other models whereas the total number of models that need to be compared with is 2^5-2. The authors at least need to include the AIC and BIC scores of all possible models in order to draw the conclusion about the performance of the full model.

      Model selection criterion: We thank the reviewer for pointing out this gap in our methodology. We have now run the model with all combinations of subsets of model parameters and have confirmed that the model with all 5 covariates outperforms all other models even when using BIC, the strictest criterion for overfitting (Figure 1 - supplement 1A). The only other model that performs well (though not as often as the 5-term model) is the 4-term model lacking ρ<sub>h</sub>. This result is not surprising as ρ<sub>h</sub> only changes substantially once in an animal’s encounter history for the single-density, multi-patch data that this model was fit to. For example, for an animal foraging on patches of density 10, on the first encounter ρ<sub>h</sub> = ~200 (see Parameter initialization above), but on every subsequent encounter ρ<sub>h</sub> = ~10. Resultantly, the effect of ρ<sub>h</sub> on the probability of exploiting is somewhat binary on the single-density, multi-patch data set. Nevertheless, we see significantly improved prediction of behavior in the novel multi-density, multi-patch data (Figure 4F) as we observe an effect of the most recently encountered patch. Additionally, we observe a similar impact (i.e., significant coefficient of negative sign) of the ρ<sub>h</sub> term when the model is fit to the multi-density, multi-patch data set (Figure 4 - supplement 4D).

      In any bacterial patch, the edges have a higher density of bacteria than the patch center. Thus, it is possible that a worm scans the patch edge density, on the basis of which it decides to accept or reject the patch whose average density is smaller. This could potentially cause an underestimate of the bacteria density used in the model. Furthermore, the potential inhomogeneity of the patch may further complicate the worm's decision-making, and the discrepancy between the reality and the model assumption will reduce the validity of the model. The authors need to estimate the inhomogeneity of the bacterial patches used in their assays and discuss how the edge effects may affect their results and conclusions.

      Bacterial patch inhomogeneity: We extensively tested the landscape of the bacterial patches by imaging fluorescently-labeled bacteria OP50-GFP (Bacterial Patch Density in Methods; Figure 2 - supplement 1-3). As the reviewer mentions, we observe significantly greater bacterial density at the patch edge. This within-patch spatial inhomogeneity results from areas of active proliferation of bacteria and likely complicates an animal’s ability to accurately assess the quantity of bacteria within a patch and, consequently, our ability to accurately compute a metric related to our assumptions of what the animal is sensing. In our study, we used the relative density of the patch edge where bacterial density is highest as a proxy for an animal’s assessment of bacterial patch density (Figure 2 – supplement 1). This decision was based on a previous finding that the time spent on the edge of a bacterial patch affected the dynamics of subsequent area-restricted search. While within-patch spatial inhomogeneity likely affects an animal’s ability to assess patch density, we do not believe that this qualitatively affects the results of our study. Both the patch densities tested (Figure 2 – supplement 3A) as well as our observations of time-dependent changes in exploitation (Figure 2E,N-O; Figure 3H-I) maintained a monotonic relationship. Therefore, alternative methods of patch density estimation should yield similar results. We have added additional discussion on this topic to our manuscript (lines 578-593).

      The authors claim that their methods (GMM and semi-supervised QDA) are unbiased. This seems unlikely as the QDA involves supervision. The authors need to provide additional explanation on this point.

      Semi-supervised QDA labelling: We have removed the term “unbiased” to avoid any misinterpretation of the methodology and clarified our method of labelling used for “supervising” QDA. Specifically, we made two simple assumptions: 1) animals must have sensed the patch if they exploited it and 2) animals must not have sensed the patch if there was no bacteria to sense. Thus, we labeled encounters as sensing if they were found to be exploitatory as we assume that sensation is prerequisite to exploitation; and we labeled encounters as non-sensing for events where animals encountered patches lacking bacteria (OD<sub>600</sub> = 0). All other points were non-labeled prior to learning the model. In this way, our labels were based on the experimental design and results of the GMM, an unsupervised method; rather than any expectations we had about what sensing should look like. The semi-supervised QDA method then used these initial labels to iteratively fit a paraboloid that best separated these clusters, by minimizing the posterior variance of classification (lines 1012-1021). See Figure 2 - supplement 8A-B for a visualization showing the labelled data.

      Based on the authors' result, worms behaviorally exhibit their preferences toward food abundance (density), which results in a preference scale for a range of densities. Does this scale vary with the worms' initial cultivation states? The author partially verified that by observing starved worms. This hypothesis could be better tested if the authors could analyze the decision-making of the worms that were initially cultivated with different densities of bacterial food.

      While we agree with the reviewer that testing the effects of varying bacterial density during animal development (cultivation) is a very interesting experiment, it is not feasible at this time. We focused our revised manuscript to include only assertions about the effects of recent experiences and added this inquiry as a future direction as described above in our response to Reviewer #1 under Cultivation density.

      It would be helpful to elaborate more on how the framework developed in this paper can be applied more broadly to other behaviors and/or organisms and how it may influence our understanding of decision-making across species.

      We thank the reviewer for alerting us to this gap in our discussion. We have added additional commentary about our model and its utility to the discussion section (lines 667-695).

      Reviewer #3 (Recommendations for the authors):

      Sensing vs. non-sensing

      Perhaps a more ethologically accurate term to describe this behavior would be "ignoring" rather than "not sensing". If the authors feel strongly about using the term "not sensing", then they should provide experimental evidence supporting this claim. However, I think simply changing the terminology negates these experiments.

      We thank the reviewer for their thoughtful comments. While we agree with the reviewer that the term “non-sensing” may not be ethologically accurate (see response to Public Review above under Interpretation of “non-sensing” encounters), we interpret the term “ignoring” to mean that the animal sensed the patches but decided not to react. We have chosen to replace the term “non-sensing” with “non-responding” to best indicate the ethological interpretation of our observation. Nonetheless, we believe that it remains possible that animals are truly not sensing the bacterial patches as our method of classification compared the behavior against encounters with patches lacking bacteria (as described above in response to Reviewer #2 under Semi-supervised QDA labelling).

      History-dependence of the GLM

      Perhaps a simpler approach would be to say the worm senses everything, and this accumulative memory affects the decision to exploit. For example, the animal essentially experiences two feeding states: feeding on patches, and starvation off of patches.

      The level of satiety could be modeled linearly:

      Satiety(t_enter:t_leave) = k_feed*patch_density*delta_t

      Where k_feed is some model parameter for rate of satiety signal accumulation, t_enter is the time the animal entered the patch, t_leave is the time the animal left the patch, and delta_t is the difference between the two. Perhaps you could add a saturation limit to this, but given your data, I doubt that is the case.

      Starvation could be modeled as simply a decay from the last satiety signal:

      Starvation(t_leave:t_enter) = Satiety(t_leave)*exp(-k_starve*delta_t).

      Where starvation is the rate constant for the decay of the satiety signal.

      For the logistic model, the logistic parameter is simply the difference between the current patch density and the current satiety signal.

      A nice thing about this approach is that it negates the need to categorize your patches. All patch encounters matter. Brief patch encounters (categorized as non-sensing and not used in the prior GLM) naturally produce a very small satiety signal and contribute very little to the exploit decision. Another nice thing about this approach is that it gives you memory timescales, that are testable. There is a rate of satiety accumulation and a rate of satiety loss. You should be able to predict behavior with lower patch density, assuming the rate constants hold. (I am not advocating you do more experiments here, just pointing out a nice feature of this approach).

      You could possibly apply this to a GLM for velocity on a non-exploited patch as well, though I assume this would be a linear GLM, given the velocity distributions you provided.

      We thank the reviewer for their time and thoughtfulness in thinking about our model. The reviewer’s proposed model seems entirely reasonable and could aid in elucidating the time component of how prior experience affects decision-making. However, we decided to keep our paper focused on using a minimal model to answer a set of core questions (e.g., Does encounter history or satiety influence decision-making?) (see above under Model design for a more detailed response). Future studies investigating the mechanisms of these foraging decisions should open the door for more mechanistically accurate models. We have expanded our discussion of the model to include this assertion (lines 667-695).

    1. Author response:

      The following is the authors’ response to the original reviews

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Sample size: If the sample size of the study is increased, more confidence and new insights can be inferred about myometrial enhancer-mediated gene regulation in term pregnancy. Such a small sample size (N = 3) limits the statistical power of the study. As mentioned in the manuscript they failed to identify chromatin loops in the second subject's biopsy is observed due to a limited sample.

      We agree with the reviewer’s comment about the sample size. We sincerely hope the result of this study would increase the interest of stakeholders to fund future projects in a larger scale.

      (2) Figure quality: There is a lack of good representations of the results (e.g., screenshots of tables as figure panels!) as well as missing interpretations that might add value to the manuscript.

      Figure 1B and 2B have been converted to the pie chart format.

      (3) Definition of super-enhancer: The definition of super-enhancer is not clear. Also, the computational merging of enhancers to define super-enhancers should be described better.

      Added more details about tool and parameter setting in the Method section of “Identification of super enhancers”:

      “Identification of super enhancers

      H3K27ac-positive enhancers were defined as regions of H3K27ac ChIP-seq peaks in each sample. The enhancers within 12.5Kb were merged by using bedtools merge function with parameter “-d 12500”. The combined enhancer regions were called super enhancers if they were larger than 15Kb. The common super enhancers from multiple samples were used for downstream analysis.”

      Reference:

      Whyte WA, Orlando DA, Hnisz D, Abraham BJ, Lin CY, Kagey MH, Rahl PB, Lee TI, Young RA. Master transcription factors and mediator establish super-enhancers at key cell identity genes. Cell. 2013 Apr 11;153(2):307-19. doi: 10.1016/j.cell.2013.03.035. PMID: 23582322; PMCID: PMC3653129.

      (4) Assay-Specific Limitations: Each assay employed in the study, such as ChIP-Seq and CRISPRa-based Perturb-Seq, has its limitations, including potential biases, sensitivity issues, and technical challenges, which could impact the accuracy and reliability of the results. These limitations should be addressed properly to avoid false-positive results and improve the interpretability of the results.

      The major limitations of the CRISPRa-based Perturb-Seq protocol in this study are the use of the hTERT-HM cells and the two-vector system for transduction. While hTERT-HM cells are a much easier platform in terms of technical operation, primary human myometrial cells are generally considered retaining a molecular context that is closer to the in vivo tissues. Due to the limitation on the efficiency of having two vectors simultaneously present in the same cell, hTERT-HM cells are much more affordable and operationally feasible to conduct the experiment. Future advancements on the increase of viral vector payload capacity may overcome this challenge and open the venue to perform the assay on primary human myometrial cells.

      (5) Sample collection and comparison: There is mention of matched gravid term and non-gravid samples whereas no description or use of control samples was found in the results. Also, the comparison of non-labor samples with labor samples would provide a better understanding of epigenomic and transcriptomic events of myometrium leading to laboring events.

      The description has been updated:

      “Collection of myometrial specimens

      Permission to collect human tissue specimens was prospectively obtained from individuals undergoing hysterectomy or cesarean section for benign clinical indications (H-33461). Gravid myometrial tissue was obtained from the margin of the hysterotomy in women undergoing term cesarean sections (>38 weeks estimated gestational age) without evidence of labor. Non-gravid myometrial tissue was collected from pre-menopausal women undergoing hysterectomy for benign conditions. Specimens from gravid women receiving treatment for pre-eclampsia, eclampsia, pregnancy-related hypertension, or pre-term labor were excluded.”

      (6) Lack of clarity:

      (6a) It is written as 'Chromatin Conformation Capture (Hi-C)'. I think Hi-C is Histone Capture and 3C is Chromosome Conformation Capture! This needs clear writing.

      As the reviewer suggested, to make it clear, we have changed the text “A high throughput chromatin conformation capture (Hi-C) assay” to “A High-throughput Chromosome Conformation Capture (Hi-C) assay”.

      (6b) In multiple places, 'PLCL2' gene is written as 'PCLC2'.

      Corrected as suggested.

      (6c) What is the biological relevance of considering 'active' genes with FPKM {greater than or equal to} 1? This needs clarification.

      In RNA-seq analysis, the gene expression levels are often quantified using FPKM (Fragments Per Kilobase of transcript per Million mapped reads). Setting a threshold of FPKM for defining "active" genes in RNA-seq analysis is biologically relevant, because it helps to distinguish between genuinely expressed genes and background noise. It helps researchers focus on genes, which are more likely to have a significant biological impact. A common threshold for defining "active" genes is FPKM ≥ 1. Genes with FPKM values below this threshold may be transcribed at very low levels or could be background noise.

      (6d) The understanding of differentially methylated genes at promoters is underrated as per the authors. But, why leaving DNA methylation apart, they selected histone modification as the basis of epigenetic reprogramming in terms of myometrium is unclear.

      DNA methylation indeed plays a crucial role in evaluating the impact of cis-acting elements on gene regulation. Large-scale studies, such as the comprehensive analysis of the myometrial methylome landscape in human biopsies (Paul et al., JCI Insight, 2022, PMID: 36066972), have provided valuable insights. When integrated with histone modification and chromatin looping data, contributed by our group and collaborators, future secondary analyses leveraging machine learning are poised to further elucidate the mechanisms underlying myometrial transcriptional regulation.

      (6e) How does the identification of PGR as an upstream regulator of PLCL2 gene expression in human myometrial cells contribute to our understanding of progesterone signaling in myometrial function?

      In a previous study, we demonstrated a positive correlation between PLCL2 and PGR expression in a mouse model and identified PLCL2's role in negatively modulating oxytocin-induced myometrial cell contraction (Peavy et al., PNAS, 2021, PMID: 33707208). The present study builds on this by providing evidence for a direct regulatory mechanism in which PGR influences PLCL2 transcription, likely through a cis-acting element located 35 kb upstream. These findings suggest that PLCL2 acts as a mediator of PGR-dependent myometrial quiescence prior to labor, rather than merely participating in a parallel pathway. Further in vivo studies are necessary to delineate the extent to which PLCL2 mediates PGR activity, particularly the contraction-dampening function of the PGR-B isoform.

      (7) Grammatical error: The manuscript has numerous grammatical errors. Please correct them.

      Corrections have been made as suggested.

      (8) Use of single-cell data: Though from the Methods section, it can be understood that single-cell RNA-seq was done to identify CRISPRa gRNA expressing cells to characterize the effect of gene activation, some results from single-cell data e.g., cell clustering, cell types, gRNA expression across clusters could be added for better elucidation.

      As reviewer suggested, we have prepared a file “PerturbSeq_summary.xlsx” (Dataset S9) to provide additional results of perturb-seq data analysis. It includes 2 spreadsheets, “Cell_per_gRNA” for clustering and “Protospacer_calls_per_cell” for gRNA expression across clusters.

      Reviewer #2 (Recommendations For The Authors):

      (1) The following are a number of grammatical issues in the abstract. I suggest having a careful read of the entire manuscript to identify additional grammatical issues as I may not be able to highlight all of these issues.

      (1a) "The myometrium plays a critical component during pregnancy." change component to role.

      (1b) "It is responsible for the uterus' structural integrity and force generation at term," à replace "," with "."

      (1c) Also, I suggest rephrasing the first 2 sentences to: The myometrium plays a critical role during pregnancy as it is responsible for both the structural integrity of the uterus and force generation at term.

      (1d) "Here we investigated the human term pregnant nonlabor myometrial biopsies for transcriptome, enhancer histone mark cistrome, and chromatin conformation pattern mapping." Remove "the", and modify to "Here we investigated human term pregnant".

      (1e) Missing period and sentence fragment, "PGR overexpression facilitated PLCL2 gene expression in myometrial cells Using CRISPR activation the functionality of a PGR putative enhancer 35-kilobases upstream of the contractile-restrictive gene PLCL2.

      Corrections have been made as suggested.

      (2) Sentence fragment: Studies on the role of steroid hormone receptors in myometrial remodeling have provided evidence that the withdrawal of functional progesterone signaling at term is due to a stoichiometric increase of progesterone receptor (PGR) A to B isoform-related estrogen receptor (ESR) alpha expression activation at term. (Mesiano, Chan et al. 2002) (Merlino, Welsh et al. 2007) (Nadeem, Shynlova et al. 2016).

      The statement has been updated:

      “Studies on the role of steroid hormone receptors in myometrial remodeling suggest that the withdrawal of functional progesterone signaling at term results from a stoichiometric shift favoring the PGR-A isoform over PGR-B. This shift is associated with increased activation of estrogen receptor alpha (ESR1) expression at term (Mesiano, Chan et al. 2002) (Merlino, Welsh et al. 2007) (Nadeem, Shynlova et al. 2016).”

      (3) FOS:JUN heterodimers are implicated to be critical for the initiation of labor through transcriptional regulation of gap junction proteins such as Cx43 (Nadeem, Farine et al. 2018) (Balducci, Risek et al. 1993).

      Use Gja1 (Gap junction alpha 1) as the current correct gene, not Cx43.

      Also, several references predate Nadeem, Farine et al. 2018 and are more appropriate to use as references for the role of Ap-1 proteins in regulating Gja1; PMID: 15618352 and PMID: 12064606 were the first to show this relationship in myometrial cells.

      The statement has been updated as suggested:

      “FOS:JUN heterodimers are implicated to be critical for the initiation of labor through transcriptional regulation of gap junction proteins such as GJA1 (Nadeem, Farine et al. 2018) (Balducci, Risek et al. 1993)”

      (4) Define PLCL2 on first use.

      Updated as suggested.

      (5) There are a number of issues with this section, "Matched sSpecimens of gravid myometrium were collected at the margin of hysterotomy from women undergoing clinically indicated cesarean section at term (>38 weeks estimated gestation age) without evidence of labor. Specimens of healthy, non-gravid myometrium were also pecimens were collected from uteri removed from pre-menopausal women undergoing hysterectomy for benign clinical indications."

      The description has been updated:

      “Collection of myometrial specimens

      Permission to collect human tissue specimens was prospectively obtained from individuals undergoing hysterectomy or cesarean section for benign clinical indications (H-33461). Gravid myometrial tissue was obtained from the margin of the hysterotomy in women undergoing term cesarean sections (>38 weeks estimated gestational age) without evidence of labor. Non-gravid myometrial tissue was collected from pre-menopausal women undergoing hysterectomy for benign conditions. Specimens from gravid women receiving treatment for pre-eclampsia, eclampsia, pregnancy-related hypertension, or pre-term labor were excluded.”

      (6) Enriched motifs were identified by HOMER (Hypergeometric Optimization of Motif EnRichment) v4.11 (Heinz, Benner et al. 2010).

      Please clarify what background is used for motif enrichment.

      We used the default background sequences generated by HOMER from a set of random genomic sequences matching the input sequences in terms of basic properties, such as GC content and length. We have added more details in the Method section:

      “DNA-binding factor motif enrichment analysis

      Enriched motifs were identified by HOMER (Hypergeometric Optimization of Motif EnRichment) v4.11 with default background sequences matching the input sequences (Heinz, Benner et al. 2010).”

      (7) "Six of the seven regions are also co-localized with previously published genome occupancy of transcription regulators curated by the ReMap Atlas"

      Please clarify if this Atlas includes myometrial tissues or not and clarify the cell types included in the atlas.

      According to the UCSC Genome Browser and the reference by Hammal et al. (2022), the current ReMap database includes PGR ChIP-seq data from human myometrial biopsies, available under NCBI GEO accession number GSE137550, alongside data from various other cell and tissue types. ReMap provides valuable insights into potential functional cis-acting elements in the genome from a systems biology perspective. However, tissue specificity requires independent validation.

      (8) "Notably, 76% of the putative super-enhancers are co-localized with known PGR-occupied regions in the human myometrial tissue (Figure S2). This is significantly higher than the 20% co-localization in the regular enhancer group (Figure S2)."

      Because there is a huge difference in the size of the putative super enhancer regions and the isolated enhancers this comparison is not appropriate as conducted. The comparison needs to account for the difference in size of the regions. Please provide P values for significance statements.

      We acknowledge the reviewer's concern that our initial statement was overstated and potentially misleading, given the substantial difference in size between putative super-enhancer regions and regular enhancers. Rather than emphasizing the enrichment, it would be more accurate to simply describe our observation that super-enhancers encompass more PGR-occupied regions.

      Here is the updated version:

      “Notably, 76% of the putative super-enhancers co-localize with known PGR-occupied regions in human myometrial tissue, compared to 20% co-localization observed in regular enhancers (Figure S2).”

      Reviewer #3 (Recommendations For The Authors):

      (1) Title is extremely misleading, as here we do not get a view of the epigenomic landscape, but rather sparce data related to H3K27ac and H3K4me (focusing on enhancers) and chromatin conformation associated with the PLCL2 transcription start site (TSS).

      As suggested, the title is modified to “Assessment of the Histone Mark-based Epigenomic Landscape in Human Myometrium at Term Pregnancy”.

      (2) Improve the first result paragraph by providing a clear rationale for the experiments and their objectives, as well as introducing the samples used. Rather than simply listing approaches and end results in Table 1, offer concise explanations for the experiments alongside the supporting data presented in detailed figures. Using appropriate figures/graphs to effectively contextualize these datasets would be greatly appreciated by readers and would add more value to this research. Currently, it is difficult for us to assess and appreciate the quality of the data.

      The following statement is included in the beginning of the Result section:

      "To better understand the regulatory network shaping the myometrial transcriptome before labor, we analyzed transcriptome and putative enhancers in individual human myometrial specimens. Using RNA-seq, we identified actively expressed RNAs, while ChIP-seq for H3K27ac and H3K4me1 was used to map putative enhancers. Active genes were associated with nearby putative enhancers based on their genomic proximity. Additionally, chromatin looping patterns were mapped using Hi-C to further link active genes and putative enhancers within the same chromatin loops."

      (3) The statistics for every sequencing approach need to be provided for each sample (e.g., RNA-seq: number of total reads, number of mapped reads, % of mapped reads; ChIP-Seq: number of mapped reads, % of mapped reads, % of duplicates).

      We have generated the summary table of each dataset included in this study (Dataset S7) [NGS-summary.xls].

      (4) Figure S1: The rationale behind comparing the Dotts study and yours regarding H3K27ac-positive regions needs to be better defined. Why is this performed if the data will not be used afterwards? What are the conserved regions associated with vs the ones that are variable? Is this biologically relevant? Why not use only the regions conserved between the 6 samples, to have more robust conclusions?

      The purpose of comparing our data with the Dotts dataset is to highlight the degree of variation across studies. In this study, we focused on addressing specific biological questions using our own dataset rather than developing methodologies for meta-analysis. Future advancements in meta-analysis techniques could leverage the combined power of multiple datasets to provide deeper insights.

      (5) Perhaps due to a lack of details, I am unable to ascertain how the putative myometrial enhancers were defined. In Dataset S1, it is stated, "we define the regions that have overlapping H3K27ac and H3K4me1 marks as putative myometrial enhancers at the term pregnant nonlabor stage (Dataset S1)". Within Dataset S1, for subjects 1, 2, and 3, H3K27ac and H3K4me1 double-positive enhancers are shown in term pregnant, non-labor human myometrial specimens, with approximately 100 regions corresponding to 131 (sample 1), 127 (sample 2), and 140 (sample 3) common peaks. However, in Figure 1a, reference is made to the 13114 putative enhancers commonly present across the three specimens. Is Dataset S1 intended to represent only a small fraction of the 13114 putative enhancers? Detailed analyses need to be conducted and better showcased.

      Dataset S1 has been updated to list all 13,114 putative enhancers.

      (6) For the gene expression analyses of RNA-seq data, FPKM values were utilized. However, it is unclear why the gene expression count matrix was normalized based on the ratio of total mapped read pairs in each sample to 56.5 million for the term myometrial specimens. I would recommend exercising caution regarding the use of FPKM expression units, as samples are normalized only within themselves, lacking cross-sample normalization. Consequently, due to external factors unaccounted for by this normalization method, a value of 10 in one sample may not equate to 10 in another.

      We value the reviewer’s input. This question will be addressed in future secondary data analyses with suitable methodologies, as it is beyond the scope of this study.

      (7) In Figure 1b, the authors have categorized their 12157 active genes into 3 bins based on FPKM values: >5 FPKM >1, >15 FPKM >5, and >15 FPKM. However, in the text, they describe these as 'actively high-expressing genes (FPKM >= 15)'. I would advise caution regarding the interpretation of these values, as an FPKM of 15 is not typically associated with highly expressed genes. According to literature and resources such as the Expression Atlas, an FPKM of 15 is generally considered to represent a low to medium expression level.

      We appreciate the reviewer’s feedback. This question will be revisited during secondary data analyses using appropriate methodologies, as it falls outside the scope of the present study.

      To increase readability and clarity, we modified the sentence as following: More than 40% of the 540 putative super enhancers are located within a 100-kilobase distance to high-expressing genes (FPKM >= 15), while only 7.3% of putative myometrial super enhancers are found near low-expressing genes (5 > FPKM >=1) (Figure 2B).

      (8) Out of the 12157 active genes, approximately two-thirds have an FPKM >15. Was this expected? How does this correspond to what is observed in the literature, particularly in other similar studies (https://pubmed.ncbi.nlm.nih.gov/30988671/ ; https://pubmed.ncbi.nlm.nih.gov/35260533/ ) .

      This is indeed an intriguing question that merits further exploration in future secondary analyses.

      (9) It is also surprising to see that for the motif enrichment analysis (Fig. 1C), the P-values are small. This is probably because the percentage of target sequences with the motif is very similar to the percentage of background sequences with the motif. For instance, for selected genes in Figure 1C: AP-1 (50.68% vs. 46.50%), STAT5 (28.08% vs. 25.04%), PGR (17.90% vs. 16.12%), etc. Can one really say that you have a biologically relevant enrichment for values that are so close between target sequences and background sequences?

      Reviewer’s comment is noted. Biological relevance shall be experimentally examined though wet-lab assays in future studies.

      (10) For Figure 2, again not convinced that FPKM >= 15 can be used to say: Compared with the regular putative enhancers, the putative myometrial super-enhancers are found more frequently near active genes that are expressed at relatively higher levels (Figure 1B and Figure 2B). A higher threshold should be used if they want to say this.

      To compare the association of putative enhancers with active genes expressed at different levels, we categorized the active genes into three groups based on their FPKM (Fragments Per Kilobase of transcript per Million mapped reads) values. These groups are defined as follows: the top third active genes (FPKM ≥ 15), the middle third active genes (5 ≤ FPKM < 15), and the bottom third active genes (1 ≤ FPKM < 5). By "active genes expressed at relatively higher levels," we refer specifically to the top third active genes with FPKM values of 15 or higher, indicating their relatively higher expression levels compared to the other groups of active genes.

      (11) More detailed explanations and methods are needed regarding how the data for Figure S2 was obtained.

      The following details were added to the methods section:

      “Colocalization of super enhancers and PGR genome occupancy was compared by calling peaks from previously published PGR ChIP-seq data (GSM4081683 and GSM4081684). The percentages of enhancers and super enhancers that manifest PGR occupancy were calculated by overlapping the genomic regions in each category with PGR occupancy regions.”

      (12) In Figure 2C, there is no information provided on the genes used to obtain the results. It would be helpful to include examples of these genes, along with their expression values, for instance.

      The expression levels of the 346 active genes that are associated with myometrial super enhancers are included in Dataset S4, along with results of the updated gene ontology enrichment analysis using the Database for Annotation, Visualization, and Integrated Discovery (DAVID) of Knowledgebase v2024q4. Selected pathways of interest are listed in updated Figure 2C.

      (13) The linking of PLCL2-related data to the first part of the story is lacking, and the rationale behind it is missing. This entire section should be more detailed, and the data should be expanded to better reflect the context.

      As suggested, we included the following statement at the beginning of the section “Cis-acting elements for the control of the contractile gene PLCL2”:

      “We previously demonstrated the positive correlation of PLCL2 and PGR expression in a mouse model and PLCL2’s function on negatively modulating oxytocin-induced myometrial cell contraction (Peavy et al., 2021). However, the mechanism underlies the PGR regulation of PLCL2 remains unclear. Taking advantage of the mapped myometrial cis-acting elements, we aimed to identify the cis-acting elements that may contribute to the PLCL2 transcriptional regulation with a special interest on the PGR-related enhancers.”

      The context is that our results provide additional evidence to support a direct regulation mechanism of PGR on the PLCL2 transcription, likely though the 35-kb upstream cis-acting element. This finding suggests that PLCL2 likely plays a mediator’s role of PGR dependent myometrial quiescence before laboring rather than a mere passenger on a parallel pathway. Further studies using in vivo models are needed to determine the extent of PLCL2 in mediating PGR, especially PGR-B isoform’s contraction-dampening function.

      (14) The entire Hi-C data should be presented to allow for the assessment of its quality and further value.

      The revised manuscript has included the Hi-C quality control summary in Dataset S8 [HiC-QC-Summary.xlsx].

      (15) The authors state: "For the purpose of functional screening, we focus on H3K27ac signals instead of using H3K27ac/H3K4me1 double positive criterium to cast a wider net." However, it is unclear how many of the targeted regions contained H3K27ac/H3K4me1 peaks. Were enhancers or super-enhancers targeted, and if so, how did they compare to H3K27ac sites?

      The numbers of H3K27ac/H3K4me1 double positive peaks are recorded in Figure 1A. Compared to the numbers of H3K27ac intervals (Table 1), the H3K27ac/H3K4me1 double positive peaks are 62.9%, 70.7%, and 61.2% of corresponding H3K27ac intervals in each individual specimen.

      (16) For the first set of data (Table 1), the authors state, "Together, these results reveal an epigenomic landscape in the human term pregnant myometrial tissue before the onset of labor, which we use as a resource to investigate the molecular mechanisms that prepare the myometrium for subsequent parturition." While it is acknowledged that an epigenetic landscape exists in all tissues, there is a lack of clarity regarding this landscape in the current manuscript, as we are only presented with a table containing numbers.

      This sentence has been revised to: “Together, these results delineate a map of H3K27ac and H3K4me1 positive signals in the human term pregnant myometrial tissue before the onset of labor, which we use as a resource to investigate the molecular mechanisms that prepare the myometrium for subsequent parturition.”

      (17) For S1, the authors conclude: These data together highlight the degree of variation in mapping the epigenome among specimens and datasets. This conclusion seems somewhat perplexing, and I find myself in partial disagreement. Firstly, providing a clear rationale for this section would strengthen the conclusions. It's important to consider what factors may contribute to this variability. It could simply be attributed to differences in experimental settings, such as variations in samples, protocols used, antibodies, sequencing departments, or overall data quality. Deeper analyses of the data could have provided more information.

      We agree with the reviewer that deeper analyses are needed in order to extract more information among studies. However, appropriate methods for meta-analyses should be carefully evaluated and employed for this purpose. We humbly believe that such a task should belong to future studies that may combine available datasets for secondary analyses, leveraging the collective contribution of the reproductive biology community.

      (18) In the methods section, please include an explanation of how enhancers and super-enhancers were defined or add appropriate citations for reference.

      Added more details about tool and parameter setting in the Method section of “Identification of super enhancers”.

      “Identification of super enhancers

      H3K27ac-positive enhancers were defined as regions of H3K27ac ChIP-seq peaks in each sample. The enhancers within 12.5Kb were merged by using bedtools merge function with parameter “-d 12500”. The combined enhancer regions were called super enhancers if they were larger than 15Kb. The common super enhancers from multiple samples were used for downstream analysis.”

      Reference:

      Whyte WA, Orlando DA, Hnisz D, Abraham BJ, Lin CY, Kagey MH, Rahl PB, Lee TI, Young RA. Master transcription factors and mediator establish super-enhancers at key cell identity genes. Cell. 2013 Apr 11;153(2):307-19. doi: 10.1016/j.cell.2013.03.035. PMID: 23582322; PMCID: PMC3653129.

      (19) Additional description on the "Inferred myometrial PGR activities and the correlation analysis "method section should be included to enhance clarity and understanding.

      The description has been updated:

      “The inferred PGR activities were represented by the T-score, which was derived by inputting the mouse myometrial Pgr gene signature, based on the differentially expressed genes between control and myometrial Pgr knockout groups at mid-pregnancy (Wu, Wang et al., 2022), into the SEMIPs application (Li, Bushel et al., 2021). The T-scores were computed using this signature alongside the normalized gene expression counts (FPKM) from 43 human myometrial biopsy specimens.”

      (20) How was the qPCR analysis performed? Was the ddCT method utilized, and was a reference gene used for control? Additional information would be beneficial.

      Quantifying relative mRNA levels was performed via the standard curve method.

      The following details were added: “Relative levels of genes of interest were normalized to the 18S rRNA.”

      (21) Regarding the RNA-Seq analysis of Provera-treated human Myometrial Specimens, the continued use of FPKM is not ideal due to potential differences in RNA composition between libraries. Additionally, clarification is needed on why Cufflinks 2.0.2 was used, considering it is no longer supported.

      FPKM (Fragments Per Kilobase of transcript per Million mapped reads) is used in RNA-Seq analysis, because it allows for the normalization of gene expression data, accounting for differences in gene length and sequencing depth, and facilitates comparability across different genes and libraries. This makes it one of the essential tools for accurately measuring and comparing gene expression levels in various biological and clinical research contexts.

      CuffLinks was once a popular tool for analyzing RNA-seq data, transcriptome assembly, and DEG identification. Its usage has declined in recent years due to the emergence of newer and more advanced tools. The main reason is that it was used for RNA-seq analysis at early stage of this study a few years ago. For the purpose of comparison and consistency, we continued using this tool for later RNA-seq analysis. If we start a new project now, we will choose newer tools, such as HISAT2, Salmon, and DEseq2.

      (22) Overall, sentence structure and typos need to be corrected across the text. Here are some examples:

      Line 17: at term, emerging studies.

      Line 20-22: Here we investigated the human term pregnant nonlabor myometrial biopsies for transcriptome, enhancer histone mark cistrome, and chromatin conformation pattern mapping.

      Line 30-32: PGR overexpression facilitated PLCL2 gene expression in myometrial cells Using CRISPR activation the functionality of a PGR putative enhancer 35-kilobases upstream of the contractile-restrictive gene PLCL2.

      Line 66-70: However, the role of differential myometrial DNA methylation at contractility-driving gene promoter CpG islands in preterm birth is not thought to be major (Mitsuya, Singh et al. 2014), but given that DNA methylation-mediated gene regulation often occurs outside of CpG islands (Irizarry, Ladd-Acosta et al. 2009), there is still work to be done at this interface.

      Line 80-83: Putative enhancers upstream of the PLCL2, a gene encoding for the protein PLCL2 which has been implicated in the modulation of calcium signaling (Uji, Matsuda et al. 2002) and maintenance of myometrial quiescence (Peavey, Wu et al. 2021), transcriptional start site were subject to functional assessment using CRISPR activation based assays.

      Line 290 : sSpecimens

      We appreciate the reviewer’s kind efforts and have made changes accordingly.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Recommendations for the authors):

      Major comments

      (1) The section on page 20 describing the proteomic analysis of EVs is poorly written and confusing, with a lot of data in the supplement. It is not clear what the proteomics data actually means.

      We appreciate your feedback on the clarity of the proteomic analysis section. We have rewritten the section on page 20 with more detained information to provide a clearer explanation of the proteomics data and its biological significance. Additionally, we have incorporated a comparative analysis of the EV and total cell lysate proteomes (Fig. 8E, Supplementary Fig. S7A, Supplementary Tables 3 and 4) for supplemental data interpretation.

      (2) The order of the data could be improved.

      We appreciate your feedback regarding the data organization. We have reorganized the order and position of some data in a more structured and coherent manner, as suggested by the reviewers.

      - Reorganization of the qPCR data (previously Fig. 1C) as Fig. 3A

      - Removal of the data on the growth analysis on raffinose media (previously Fig. 7H).

      -Reorganization of the spotting data of the double mutant (previously Fig 3B) to Supplementary Fig. S3B

      - Reorganization of the subcellular localization data (previously Fig 3E) to Supplementary Fig. S4A

      (3) The discussion is repetitive with the introduction and merely summarizes the results and speculates on the mechanism of how the absence of UGGT, leading to ERQC defects, results in defective EV biogenesis/cargo loading in C. neoformans.

      We removed several repetitive sentences in the discussion and provided additional information on proteome analysis.

      Other questions and comments

      (1) Instead of comprehensively analyzing EVs from the UGG1 mutant, a more informative approach to better understanding how defects in N-linked glycosylation impact secretion, would be to do a proteomic analysis on the total secretions (including beta glucanase-treated cells to release classically secreted proteins from the cell wall) and EVs.

      We agree that a comprehensive proteomic analysis of total secretions and classically secreted proteins would provide deeper insights into how defects in N-glycosylation impact secretion in C. neoformans. To address this concern, we performed an additional set of proteomic analyses, the proteome profiles of total cell lysates and the secretome of C. neoformans cultivated in SD broth and presented the results as Supplementary Table S5 and Supplementary Fig. S7B. These additional analyses provide further insights into the impact of UGG1 deletion on both conventional and unconventional secretion pathways, supporting a more pronounced effect of the UGG1 defect on EV-mediated trafficking. The discussion has been updated accordingly (Page 22, lines 509-514).

      (2) The melanization defect in Ugg1 mutant is not strong. Could the reduction be due to partially compromised Ugg1 mutant growth at 30{degree sign}C as indicated in the spot tests. Were photos of the spot dilution assays taken at 1 and 2 days to investigate slower growth? Or alternatively were growth curves taken in a liquid culture?

      For accuracy of melanin synthesis defect, in addition to analysis on L-DOPA plates, we had assessed melanin production in liquid L-DOPA medium following a 3-day incubation, and the melanin production in liquid media was normalized by cell density (OD<sub>600</sub>). The data on normalized melanin production is now included as Fig. 4B in the revised manuscript. The defective laccase activity in the _ugg1_Δ mutant (Fig. 7C) further corroborates our melanization assay results, which is additionally mentioned in the text (Page 18, lines 393-395).

      (3) Is it accurate to say that some virulence factors (i.e. melanin, capsule and phosphatases) are predominantly trafficked through EV's in C. neoformans? Have studies been done to determine the proportion of virulence factors trafficked via EV's versus traditional secretion?

      We thank you for the thoughtful comments. Some virulence factors, such as urease, melanin and capsule polysaccharides, lack a signal peptide required for targeting for the conventional ER/Golgi secretion pathway. It is generally assumed that the trafficking of these factors in C. neoformans is predominantly mediated by non-conventional secretion via EVs. Additionally, even some virulence factors with signal peptides, such as laccase and phosphatases, are also transported via EVs besides the conventional secretion. The quantitative analysis to compare the proportion of virulence factors secretion via EVs versus the conventional pathway has not been yet reported, despite that genetic evidence suggests that conventional secretion also plays a significant role in the export of capsule polysaccharides. Thus, we were also careful not to highlight EV as the main route of virulence factors in the manuscript.

      (4) There is insufficient background in the introduction linking what is known about the ERQC process to secretion in general. The topic changes from the ERQC process to fungal virulence factor, with a primary focus on non-classical (EV-based) secretion. Classical secretion should also be discussed without assuming that non classical (EV) secretion is the major pathway contributing to fungal virulence.

      We appreciate your insightful comments highlighting the need for more background on the ERQC process and its relationship with secretion. To address the reviewer’s concerns, we have added sentences to describe the key roles of ERQC in conventional protein secretion in the Introduction (Page 5, lines 102-106).

      (5) Figure 1A. What does the blue filled circle with the red outline signify? Fig 1 A legend is not well explained. A summary using material provided in the intro/discussion should be included to briefly explain the process and the differences between fungal species. Please also be aware that the intro starts describing the human ERQC process and then switches to what happens in S. cerevisiae.

      We have revised Figure 1A by removing the red circle and updated the figure legend in the revised manuscript to include more detailed information about the ERQC differences across higher eukaryotes and fungal species.

      (6) Figure 2A. There are no units on the Y-axis. Presumably, the scale is the same for all 3 strains.

      Thank you for your comments. The Y-axis is the same for all three strains and, as in Fig. 2C, and represents the relative fluorescence intensity obtained from the HPLC analysis. We added the units on the Y-axis in Fig. 2A.

      (7) If Mnl1 and 2 have proposed roles in proteasomal degradation, wouldn't they be expected to have ER retention signals, like Ugg1?

      We appreciate your valuable insights regarding the absence of ER retention signals in Mnl1 and Mnl2. Previous studies have shown that Saccharomyces cerevisiae Mnl1/Htm1 does not possess canonical KDEL/HDEL-like ER retention signals. Instead, its retention in the ER lumen is facilitated through its interaction with protein disulfide isomerase Pdi1, which contains an HDEL sequence (Gauss et al. 2011). Thus, it is expected that non-canonical retention mechanisms—such as interactions with other ER proteins—could contribute to the retention of Mnl1 and Mnl2 within the ER. We added this information to the revised manuscript (Page 8, lines 154-159).

      (8) Figure 1 C qPCR showing change in mRNA in response to ER stress should not be grouped in this figure. It could be standalone or discussed when the spot dilution assays are performed. Anyway, spots tests are more convincing of a role in stress response than qPCR as the ugg1 mutant is sensitive to tunicamycin, DTT and cell wall stressing agents.

      As suggested by the reviewer, we have reorganized the qPCR data as a part of Figure 3 (Figure 3A) in the revised manuscript.

      (9) It is odd that mns1/101 mutants are not sensitive to ER and CW stress given their proposed differing location/function in the pathway (Figure 1) determined from the N-linked profiling. Any explanation? Could there be redundancy?

      We appreciate the reviewer’s observation regarding the lack of ER and CW stress sensitivity in the mns1_Δ and _mns101_Δ mutants, despite their proposed roles in _N-glycan processing. We had previously reported that the C. neoformans alg3_Δ mutant, lacking a critical enzyme responsible for the synthesis of Dol-PP-Man<sub>6</sub>GlcNAc<sub>2</sub> in the _N-glycosylation pathway, exhibited clearly impaired N-glycan elongation, but showed no detectable growth defects even under stress conditions in vitro. However, alg3_Δ is avirulent in _in vivo pathogenicity (Thak et al., 2020). Similarly, the mns1_Δ_101_Δ double mutant shows glycan-processing defects that do not compromise cellular fitness under stress conditions but result in attenuated virulence in animal models. These findings suggest that some glycosylation-related defects may impact more severely _in vivo pathogenicity rather than in vitro stress sensitivity.

      (10) Although the Silver-stained gels of the ugg1 mutant are not particularly informative, why weren't they (and Con A blots) performed for the other mutants?

      The overall decrease of hypermannosylated glycans observed in the ugg1_Δ mutant allowed us to detect clear alterations in protein glycosylation patterns in the lectin blot using _Galanthus nivalis agglutinin, which recognizes terminal α1,2-, α1,3-, and α1,6-linked mannose residues. In contrast, the limited changes of a few glycan species in other mutants, including mns1_Δ, _mns101_Δ, and _mns1_Δ_101_Δ, are relatively subtle to be detected in the lectin blot, due to only minor differences in the average lengths of their _N-glycans compared to the WT. Therefore, we presented the lectin blotting data only for the _ugg1_Δ mutant.

      (11) If there is ER stress under normal conditions in the Ugg1 mutant then technically this mutant should be growing more slowly under normal conditions. This is difficult to predict in a spot dilution assay where growth is only visualized at day three when any growth defect may have been corrected. The slower growth rather than the reduced secretion of GXM specifically is therefore more likely to be responsible for the reduced virulence.

      We appreciate the reviewer’s insightful comment regarding the interplay between ER stress, growth defects, and virulence attenuation in the ugg1_Δ mutant. While retarded growth in _C. neoformans is often associated with reduced virulence, there are a few exceptions. For instance, disruptions in cell cycle progression in C. neoformans have been reported to result in larger capsule sizes, which rather enhance in vivo virulence when analyzed in Galleria mellonella infection models (García-Rodas et al., 2014). This highlights that growth defect alone is not sufficient for virulence attenuation. In the case of the _ugg1_Δ mutant, we speculate that the almost complete loss of virulence is attributed not only to its growth retardation but also to its impaired secretion of key virulence factors, including the polysaccharide capsule.

      (12) The rationale for using leucine analogue 5',5',5'-trifluoroleucine (TFL), in a growth assay (Fig. 3C) to determine whether the defective ugg1Δ phenotypes are induced by ER stress caused by misfolded protein accumulation is not explained.

      The leucine analogue 5',5',5'-trifluoroleucine (TFL) can be incorporated into newly synthesized proteins, disrupting normal folding and thus leading to the generation of misfolded proteins (Trotter et al., 2002; Cowie et al., 1959). In the context of a defective ERQC pathway, these misfolded proteins cannot be adequately repaired, resulting in their accumulation and triggering ER stress. Excessive ER stress may ultimately inhibit cell growth in the presence of TFL. This explanation has been incorporated into the revised manuscript (Page 11, lines 236–241).

      (13) I would argue that only the Ugg1 and double Mns mutant were defective in virulence. For the single mutants, it looks like no difference was found relative to WT. The longer median survival of these mutants (if significant) is most likely due to poor infection technique.

      We agree with the reviewer’s opinion that the mns1_Δ and _mns101_Δ single mutants have no significant difference in _in vivo virulence compared to the WT strain, unlike the _mns1_Δ_101_Δ double mutant which showed significant attenuated virulence. We had previously addressed that in the manuscript (Page 13, lines 267-269).

      (14) The authors conclude that the ugg1Δ strain specifically is impaired in extracellular secretion of capsular polysaccharides but is this via classical (SAV1) secretion or EVs?

      In addition to EV-mediated transport, capsular polysaccharide secretion can occur via the Sav1 (Sec4p)-mediated classical secretion pathway. However, our proteome data of total cell lysates indicated that the protein levels of Sav1 were comparable between the WT and _ugg1_Δ strains, suggesting that Sav1p function itself might not be impaired. Given that the _ugg1_Δ mutant exhibits altered vesicular structures (Supplementary Fig. S6) and loss of microvesicles (Fig. 8A), we speculate that a defect might occur at a post-Sav1p step, such as vesicle fusion with the plasma membrane, likely contributing to the complete defect in secretion of capsular polysaccharides in the _ugg1_Δ strain, in which EV biogenesis and defective cargo loading are severely impaired, producing EVs that lack capsular polysaccharides (Figure 8F). However, further studies should be carried out to define the contribution of SAV1 to the secretion of capsular polysaccharides in in the _ugg1_Δ strain.

      (15) The rationale for doing 7 H is very confusing.

      The experiment assessing raffinose utilization as a carbon source was inspired by the previous work of Garcia-Rivera et al., reporting that the _cap59_Δ mutant is unable to utilize raffinose due to a defect in the secretion of raffinose-hydrolyzing enzymes. As another way to investigate potential defects in the conventional secretion pathway, we investigated the growth of the _ugg1_Δ mutant in the presence of raffinose. Due to our extensive data length, we have decided to remove this complementary data from the manuscript.

      (16) It is speculated in the discussion that ER stress impacts lipid/sterol synthesis and that LDs (lipid droplets?) aid the UPR and ERAD in degrading misfolded proteins during ER stress in S. cerevisiae. The authors mention that they observed a drastic increase in LDs in the ugg1Δ mutant. Where is this data? Even with the data, this is all speculation. The authors also speculate that increased numbers of vacuoles in ugg1 (where is the data?) could be the cause of the altered vesicular structures observed in the mutants, which may indicate abnormal lipid homeostasis caused by the ERQC defects, which could, in turn, affect EV biogenesis. Again, this is speculative.

      The data on lipid droplets (LDs) and vacuole staining are presented in Supplementary Figure S6, showing a drastic increase in LDs and an increased in vacuolar size in the _ugg1_Δ mutant compared to the wild-type strain, especially in capsule-inducing conditions. In addition to such changes in vesicular structures, our preliminary data on sphingolipids and sterol analysis in the surface lipid fraction of the _ugg1_Δ mutant led us to propose the hypothesis that ERQC defects may impact lipid metabolism, which in turn could influence EV biogenesis and membrane properties. It is expected that these findings would provide a strong foundation for future studies exploring the link between ERQC, lipid homeostasis, and EV biogenesis. We have revised our speculation on the association of abnormal lipid homeostasis, caused by ERQC, with EV biogenesis more appropriately by adding the information on our preliminary data of lipid profiles and mentioning that the _ugg1_Δ mutant lacks microvesicles, which are derived from the plasma membrane (Page 24, lines 554-559).

      Reviewer #2 (Recommendations for the authors):

      (1) My suggestions for the authors are the same as those presented in the public review: (1) reducing the text in certain sections of the paper to improve readability for the audience, and (2) reconsidering the figures to reduce the amount of information in each one, moving some of the content to the supplementary material.

      We thank the reviewer for their constructive suggestions regarding the organization and readability of the manuscript. As suggested, we addressed your concerns as follows:

      (1) Reducing the text in the Introduction, Results, and Discussion sections by removing repetitive statements and simplifying complex descriptions where possible.

      (2) Changing the presentation of figures: we have also reorganized the presentation of some data by moving non-essential data to the supplementary material. The updated figures and supplementary materials have been clearly referenced in the text to guide readers.

      (3) Reorganization of materials and methods: some parts of methods were moved to Supplementary Information

      (4) Removal of Figure 7H and the sentences describing the result

      More detailed explanations on the reduction and reorganization are also described in the response to the major comments (2) and (3) made by Reviewer #1.

      (2) Figure 3, for example, shows no difference in fungal growth under different cultivation conditions. This information is valuable but could be mentioned in the text, with the image provided as supplementary material, focusing the figure only on images that show significant growth differences among the strains. I suggest a similar approach for other figures so that the authors can include only the most relevant results in the main body of the article and move some figures to the supplementary materials.

      For Fig. 3, the spotting data of the double mutant (previously Fig. 3B) is now presented in the supplementary information (Supplementary Fig. S3B). Additionally, the subcellular localization data (previously Fig 3E) was also moved to the supplementary material (Supplementary Fig. S4A).

      Reviewer #3 (Recommendations for the authors):

      (1) Line 43 "EV-mediated transport of virulence bags" doesn't make sense. EVs have been described as "virulence bags" (and are in this work later in the introduction) but this should here be "transport of virulence factors" or "compounds associated with virulence" but only if you have confirmed that the "cargo" is consistent with this- which is not evident in the abstract.

      Thank you for your insightful comment. We have revised this to "EV-mediated transport of virulence factors" in line with your suggestion.

      (2) Line 49 "secretory pathway" - is there not more than one secretion pathway?

      Thank you for pointing this out. The term "secretory pathway" has been updated to "secretory pathways" to acknowledge the presence of both conventional and unconventional secretion mechanisms.

      (3) Line 53 "recognizes folding defects, repairs them, and ensures the translocation of irreparable misfolded proteins" should be "recognizes folding defects and repairs them or ensures the translocation of irreparable misfolded proteins.

      Thank you for pointing this out. We have revised the sentence as you suggested.

      (4) Lines 88-90 ALG needs to be written out the first time - Asn-linked glycans. Also, consider adding that ALG genes are present in most eukaryotes as it is unclear what you are comparing C. neoformans to.

      Thank you for your helpful comment. We have revised the text to write out "ALG" as "Asn-linked glycosylation" and added the sentence “ALG genes are evolutionary conserved in most eukaryotes” in the revised manuscript (Page 4, line 84).

      (5) Line 99 Cryptococcus has already been abbreviated to C. so don't write it out again.

      We have corrected "Cryptococcus" to “C.” throughout the manuscript after its first mention.

      (6) Line 152- tunicamycin and DTT are not described yet, which may make it challenging for some readers to understand what these drugs are doing/why they were used. What is on lines 156 and 157 for these drugs should go up with the first mention of these drugs.

      Thank you for your helpful suggestion. We have revised the manuscript to include the descriptions and purpose of using tunicamycin (TM) and dithiothreitol (DTT) immediately following their first mention, as recommended (Page 10, lines 208-210).

      (7) The text for Figure 1 C is inaccurate. High temperature also induced KAR2, as noted above, but inaccurately stated in line 160. There is no comment on the significant UGG1 increase with tunicamycin or that KAR2 was highest in this condition.

      Thank you for your thoughtful comment. We have better clarified the significant increase of UGG1 expression following tunicamycin treatment and KAR2 induction upon heat stress in the revised manuscript (Page 10, lines 216-217). Please note that Fig. 1C was revised and is now referred to as Fig. 3A.

      (8) Figure 2B is not well explored/explained. There appears to be more protein in the mutant, including of higher weight in the intracellular compartment. It is difficult to ascertain if there is more too in the secretion phase with this gel. The methods do not specifically describe the concentration of protein added - just volume. Is what we are seeing a loading issue vs real differences?

      Thank you for your insightful comments regarding Figure 2B. We added information on amounts of protein (30 µg per lane) in the legend of Figure 2B.

      The main purpose of Fig. 2B is to examine the altered glycosylation pattern of ERQC by detecting glycoproteins using the Galanthus nivalis agglutinin, which specifically bind terminal α1,2-, α1,3-, and α1,6-linked mannose residues. The result of lectin blotting indicated that glycoproteins are more abundantly detected in the secretion fraction compared to in the soluble intracellular fraction, consistent with the general notion that more than 50% of secretory proteins are glycoproteins. Also, the more abundant proteins with decreased molecular weight in the secretion fraction of ugg1_Δ mutant supported the _N-glycan profiles with decreased hypermannosylation in _ugg1_Δ mutant. We added the purpose and more detailed interpretation on Figure 2B in the revised manuscript (Page 9, lines 174-179).

      (9) Line 242 "melanin pigment" is redundant as melanin is a pigment.

      We thank the reviewer for pointing out the redundancy in the phrase. We revised the text to simply state "melanin".

      (10) Line 250 drops "completely" especially as the mutant did colonize the lungs of mice.

      To avoid any possible misleading, we removed the term "completely" in the revised manuscript.

      (11) Line 275- need to reference 18B7 as it is first introduced here.

      We added the reference on the antibody 18B7 in the revised manuscript.

      (12) Line 308- there are specific techniques to measure GXM size that could validate or refute the statement on "incomplete" polysaccharides. For example, DOI:10.1128/EC.00268-09.

      We appreciated the valuable suggestion on specific techniques to measure GXM size, which will be one of key experiments in our future study. In the revised manuscript we cited the suggested reference to indicate the need for validation of our statement (Page 14, lines 316-318).

      (13) Line 496 "mammals" - why is this used when the study is on a fungus, not a mammal? The structure of the first 2 paragraphs can be clearer to focus more on fungal biology.

      We have compared both mammals and fungi to emphasize that the ERQC system is conserved among eukaryotes but diverged with a few species-specific features. This comparison is relevant in the context of understanding the evolutionary unique features of ERQC pathways in C. neoformans. We modified the first 2 paragraphs to clarify the main issue of our present study (Page 21, lines 472-483).

      (14) Line 525- the ugg mutant was not avirulent as CFU was present and histopathology in the supplementary figures shows the tissue with ugg1 deletion was not normal (although the images are not especially easy to review). Yes, the mutant did not kill under your test conditions, but it was not avirulent (incapable of causing disease). Significantly attenuated or other descriptors should be utilized. Line 548 is also thus incorrect "complete loss of virulence").

      We appreciate the reviewer’s concern regarding the description of the _ugg1_Δ mutant as avirulent. We agree that the use of merely “avirulent" may not fully capture the observed phenotypes in the CFU and histopathological data, since we cannot exclude the possibility that the _ugg1_Δ mutant retains the ability to establish an infection. Thus, we have revised the text by describing the _ugg1_Δ mutant as "almost avirulent".

      (15) Line 597- the study by Fukuoka used kidney cells. It is misleading to not clearly state that this finding of ER stress was NOT done in fungi as the way it is presented makes it read as if this work was performed in C. neoformans. This should be clarified. This should also be double-checked and clarified for other statements, such as the reference to Harada in line 606, as this study used melanoma cells. These cell types are very different from cryptococcus- though I absolutely concur that lessons can be learned from comparative assessments.

      We thank the reviewer for pointing out the need to clarify the experimental context of the cited studies. We explicitly stated the host cell types used in the referenced studies by Fukuoka et al. and by Harada et al., respectively, in the revised manuscript (Page 25, lines 560 and 568).

    1. Author response:

      Joint Public Review:

      Summary:

      In this study, Daniel et al. used three cognitive tasks to investigate behavioral signatures of cerebellar degeneration. In the first two tasks, the authors found that if an equation was incorrect, reaction times slowed significantly more for cerebellar patients than for healthy controls. In comparison, the slowing in the reaction times when the task required more operations was comparable to normal controls. In the third task, the authors show increased errors in cerebellar patients when they had to judge whether a letter string corresponded to an artificial grammar.

      Strengths:

      Overall, the work is methodologically sound and the manuscript well written. The data do show some evidence for specific cognitive deficits in cerebellar degeneration patients.

      Thank you for the thoughtful summary and constructive feedback. We are pleased that the methodological rigor and clarity of the manuscript were appreciated, and that the data were recognized as providing meaningful evidence regarding cognitive deficits in cerebellar degeneration.

      Weaknesses:

      The current version has some weaknesses in the visual presentation of results. Overall, the study lacks a more precise discussion on how the patterns of deficits relate to the hypothesized cerebellar function. The reviewers and the editor agreed that the data are interesting and point to a specific cognitive deficit in cerebellar patients. However, in the discussion, we were somewhat confused about the interpretation of the result: If the cerebellum (as proposed in the introduction) is involved in forming expectations in a cognitive task, should they not show problems both in the expected (1+3 =4) and unexpected (1+3=2) conditions? Without having formed the correct expectation, how can you correctly say "yes" in the expected condition? No increase in error rate is observed - just slowing in the unexpected condition. But this increase in error rate was not observed. If the patients make up for the lack of prediction by using some other strategy, why are they only slowing in the unexpected case? If the cerebellum is NOT involved in making the prediction, but only involved in detecting the mismatch between predicted and real outcome, why would the patients not show specifically more errors in the unexpected condition?

      Thank you for asking these important questions and initiating an interesting discussion. While decision errors and processing efficiency are not fully orthogonal and are likely related, they are not necessarily the same internal construct. The data from Experiments 1 and 2 suggest impaired processing efficiency rather than increased decision error. Reaction time slowing without increased error rates suggests that the CA group can form expectations but respond more slowly, possibly due to reduced processing efficiency. Thus, this analysis of our data can indicate that the cerebellum is not essential for forming expectations, but it plays a critical role in processing their violations.

      Relatedly, two important questions remain open in the literature concerning the cerebellum’s role in expectation-related processes. The first is whether the cerebellum contributes to the formation of expectations or the processing of their violations. In Experiments 1 and 2, the CA group did not show impairments in the complexity manipulation. As mentioned by the editors, solving these problems requires the formation of expectations during the reasoning process. Given the intact performance of the CA group, these results suggest that they are not impaired in forming expectations. However, in both Experiments 1 and 2, patients exhibited selective impairments in solving incorrect problems compared to correct problems. Since expectation formation is required in both conditions, but only incorrect problems involve a violation of expectation (VE), we hypothesize that the cerebellum is involved in VE processes. We suggest that the CA group can form expectations in familiar tasks, but are impaired in processing unexpected compared to expected outcomes. This supports the notion that the cerebellum contributes to VE, rather than to forming expectations.

      Importantly, while previous experimental manipulations(1–6) have provided important insights, some may have confounded these two internal constructs due to task design limitations (e.g., lack of baseline conditions). Notably, some of these previous studies did not include control conditions (e.g., correct trials) where there was no VE. In addition, other studies did not include a control measure (e.g., complexity effect), which limits their ability to infer the specific cerebellar role in expectation manipulation.

      In addition to the editors’ question, we would like to raise a second important question regarding cerebellar contributions to expectations-related processes. While our findings point to a both unique and consistent cerebellar role in VE processes in sequential tasks, we do not aim to generalize this role to all forms of expectations(2,7,8). Another interesting process is how expectations are formed. Expectations can be formed by different processes(2,7,8), and this should be taken into account when defining cerebellar function. For instance, previous experimental paradigms(1–6), aiming to assess VE, utilized tasks that manipulated rule-based errors or probability-based errors, but did not fully dissociate these constructs. In our Experiments 1 and 2, we specifically manipulated error signals derived from previous top-down effects. However, in Experiment 3, the participant’s VE was derived from within-task processes. In Experiment 3, expectations were formed either by statistical learning or by rule-based learning. During the test stage, when evaluating sensitivity to correct and incorrect problems, the CA group showed deficits only when expectations were formed based on rules. These findings suggest that cerebellar patients may retain a general ability to form expectations. However, their deficit appears to be specific to processing rule-based VE, but not statistically derived VE. This pattern of results aligns with the results of Experiments 1 and 2 where the rules are known and based on pre-task knowledge.

      We suggest that these two key questions are relevant to both motor and non-motor domains and were not fully addressed even in the previous, well-studied motor domain. Thus, the current experimental design used in three different experiments provides a valuable novel experimental perspective, allowing us to distinguish between some, but not all, of the processes involved in the formation of expectations and their violations. For instance, to our knowledge, this is the first study to demonstrate a selective impairment in rule-based VE processing in cerebellar patients across both numerical reasoning and artificial grammar tasks.

      If feasible, we propose that future studies should disentangle different forms of VE by operationalizing them in experimental tasks in an orthogonal manner. This will allow us, as a scientific community, to achieve a more detailed, well-defined cerebellar motor and non-motor mechanistic account.

      References

      (1) Butcher, P. A. et al. The cerebellum does more than sensory prediction error-based learning in sensorimotor adaptation tasks. J. Neurophysiol. 118, 1622–1636 (2017).

      (2) Moberget, T., Gullesen, E. H., Andersson, S., Ivry, R. B. & Endestad, T. Generalized role for the cerebellum in encoding internal models: Evidence from semantic processing. J. Neurosci. 34, 2871–2878 (2014).

      (3) Riva, D. The cerebellar contribution to language and sequential functions: evidence from a child with cerebellitis. Cortex. 34, 279–287 (1998).

      (4) Sokolov, A. A., Miall, R. C. & Ivry, R. B. The Cerebellum: Adaptive Prediction for Movement and Cognition. Trends Cogn. Sci. 21, 313–332 (2017).

      (5) Fiez, J. A., Petersen, S. E., Cheney, M. K. & Raichle, M. E. Impaired non-motor learning and error detection associated with cerebellar damage. A single case study. Brain 115 Pt 1, 155–178 (1992).

      (6) Taylor, J. A., Krakauer, J. W. & Ivry, R. B. Explicit and Implicit Contributions to Learning in a Sensorimotor Adaptation Task. J. Neurosci. 34, 3023–3032 (2014).

      (7) Sokolov, A. A., Miall, R. C. & Ivry, R. B. The Cerebellum: Adaptive Prediction for Movement and Cognition. Trends Cogn. Sci. 21, 313–332 (2017).

      (8) Fiez, J. A., Petersen, S. E., Cheney, M. K. & Raichle, M. E. IMPAIRED NON-MOTOR LEARNING AND ERROR DETECTION ASSOCIATED WITH CEREBELLAR DAMAGEA SINGLE CASE STUDY. Brain 115, 155–178 (1992).

      (9) Picciotto, Y. De, Algon, A. L., Amit, I., Vakil, E. & Saban, W. Large-scale evidence for the validity of remote MoCA administration among people with cerebellar ataxia administration among people with cerebellar ataxia. Clin. Neuropsychol. 0, 1–17 (2024).

      (10) Binoy, S., Monstaser-Kouhsari, L., Ponger, P. & Saban, W. Remote Assessment of Cognition in Parkinsons Disease and Cerebellar Ataxia: The MoCA Test in English and Hebrew. Front. Hum. Neurosci. 17, (2023).

      (11) Saban, W. & Ivry, R. B. Pont: A protocol for online neuropsychological testing. J. Cogn. Neurosci. 33, 2413–2425 (2021).

      (12) Algon, A. L. et al. Scale for the assessment and rating of ataxia : a live e ‑ version. J. Neurol. (2025). doi:10.1007/s00415-025-13071-7

      (13) McDougle, S. D. et al. Continuous manipulation of mental representations is compromised in cerebellar degeneration. Brain 145, 4246–4263 (2022).

    1. Author response:

      eLife Assessment

      This important study uses an innovative task design combined with eye tracking and fMRI to distinguish brain regions that encode the value of individual items from those that accumulate those values for value-based choices. It shows that distinct brain regions carry signals for currently evaluated and previously accumulated evidence. The study provides solid evidence in support of most of its claims, albeit with current minor weaknesses concerning the evidence in favour of gaze-modulation of the fMRI signal. The work will be of interest to neuroscientists working on attention and decision-making.

      We thank the Editor and Reviewers for their summary of the strengths of our study, and for their thoughtful review and feedback on our manuscript. We plan to undertake some additional analyses suggested by the Reviewers to bolster the evidence in favor of gaze-modulation of the fMRI signal.

      Reviewer #1 (Public review):

      Summary:

      This study builds upon a major theoretical account of value-based choice, the 'attentional drift diffusion model' (aDDM), and examines whether and how this might be implemented in the human brain using functional magnetic resonance imaging (fMRI). The aDDM states that the process of internal evidence accumulation across time should be weighted by the decision maker's gaze, with more weight being assigned to the currently fixated item. The present study aims to test whether there are (a) regions of the brain where signals related to the currently presented value are affected by the participant's gaze; (b) regions of the brain where previously accumulated information is weighted by gaze.

      To examine this, the authors developed a novel paradigm that allowed them to dissociate currently and previously presented evidence, at a timescale amenable to measuring neural responses with fMRI. They asked participants to choose between bundles or 'lotteries' of food times, which they revealed sequentially and slowly to the participant across time. This allowed modelling of the haemodynamic response to each new observation in the lottery, separately for previously accumulated and currently presented evidence.

      Using this approach, they find that regions of the brain supporting valuation (vmPFC and ventral striatum) have responses reflecting gaze-weighted valuation of the currently presented item, whereas regions previously associated with evidence accumulation (preSMA and IPS) have responses reflecting gaze-weighted modulation of previously accumulated evidence.

      Strengths:

      A major strength of the current paper is the design of the task, nicely allowing the researchers to examine evidence accumulation across time despite using a technique with poor temporal resolution. The dissociation between currently presented and previously accumulated evidence in different brain regions in GLM1 (before gaze-weighting), as presented in Figure 5, is already compelling. The result that regions such as preSMA respond positively to |AV| (absolute difference in accumulated value) is particularly interesting, as it would seem that the 'decision conflict' account of this region's activity might predict the exact opposite result. Additionally, the behaviour has been well modelled at the end of the paper when examining temporal weighting functions across the multiple samples.

      Thank you!

      Weaknesses:

      The results relating to gaze-weighting in the fMRI signal could do with some further explication to become more complete. A major concern with GLM2, which looks at the same effects as GLM1 but now with gaze-weighting, is that these gaze-weighted regressors may be (at least partially) correlated with their non-gaze-weighted counterparts (e.g., SVgaze will correlate with SV). But the non-gaze-weighted regressors have been excluded from this model. In other words, the authors are not testing for effects of gaze-weighting of value signals *over and above* the base effects of value in this model. In my mind, this means that the GLM2 results could simply be a replication of the findings from GLM1 at present. GLM3 is potentially a stronger test, as it includes the value signals and the interaction with gaze in the same model. But here, while the link to the currently attended item is quite clear (and a replication of Lim et al, 2011), the link to previously accumulated evidence is a bit contorted, depending upon the interpretation of a behavioural regression to interpret the fMRI evidence. The results from GLM3 are also, by the authors' own admission, marginal in places.

      We thank the Reviewer for their thoughtful critique. We acknowledge that our formulation of GLM2 does not test for the effects of gaze-weighted value signals beyond the base effects of value, only in place of the base effects of value. In our revision, we plan to examine alternative ways of quantifying the relative importance of gaze in these results.  

      Reviewer #2 (Public review):

      Summary:

      In this paper, the authors seek to disentangle brain areas that encode the subjective value of individual stimuli/items (input regions) from those that accumulate those values into decision variables (integrators) for value-based choice. The authors used a novel task in which stimulus presentation was slowed down to ensure that such a dissociation was possible using fMRI despite its relatively low temporal resolution. In addition, the authors leveraged the fact that gaze increases item value, providing a means of distinguishing brain regions that encode decision variables from those that encode other quantities such as conflict or time-on-task. The authors adopt a region-of-interest approach based on an extensive previous literature and found that the ventral striatum and vmPFC correlated with the item values and not their accumulation, whereas the pre-SMA, IPS, and dlPFC correlated more strongly with their accumulation. Further analysis revealed that the pre-SMA was the only one of the three integrator regions to also exhibit gaze modulation.

      Strengths:

      The study uses a highly innovative design and addresses an important and timely topic. The manuscript is well-written and engaging, while the data analysis appears highly rigorous.

      Weaknesses:

      With 23 subjects, the study has relatively low statistical power for fMRI.

      We thank the Reviewer for their comments on the strengths of the manuscript, and for highlighting an important limitation. We agree that the number of participants in the study, after exclusions, was lower than your typical fMRI study. However, it is important to note that we do have a lot of data for each subject. Due to our relatively fast, event-related design, we have on average 65 trials per subject (SD = 18) and 5.95 samples per trial (SD \= 4.03), for an average of 387 observations per subject (SD = 18). Our model-based analysis looks for very specific neural time courses across these ~387 observations, giving us substantial power to detect our effects of interest. Still, we acknowledge that our small number of subjects does still limit our power and our ability to generalize to other subjects. We plan to add the following disclaimer to the Discussion section:

      “Together with our limited sample size (n = 23), we may not have had adequate statistical power required to observe consistent effects. Additional research with larger sample sizes is needed to resolve this issue.”

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      A cortico-centric view is dominant in the study of the neural mechanisms of consciousness. This investigation represents the growing interest in understanding how subcortical regions are involved in conscious perception. To achieve this, the authors engaged in an ambitious and rare procedure in humans of directly recording from neurons in the subthalamic nucleus and thalamus. While participants were in surgery for the placement of deep brain stimulation devices for the treatment of essential tremor and Parkinson's disease, they were awakened and completed a perceptual-threshold tactile detection task. The authors identified individual neurons and analyzed single-unit activity corresponding with the task phases and tactile detection/perception. Among the neurons that were perception-responsive, the authors report changes in firing rate beginning ~150 milliseconds from the onset of the tactile stimulation. Curiously, the majority of the perception-responsive neurons had a higher firing rate for missed/not perceived trials. In summary, this investigation is a valuable addition to the growing literature on the role of subcortical regions in conscious perception.

      Strengths:

      The authors achieved the challenging task of recording human single-unit activity while participants performed a tactile perception task. The methods and statistics are clearly explained and rigorous, particularly for managing false positives and non-normal distributions. The results offer new detail at the level of individual neurons in the emerging recognition of the role of subcortical regions in conscious perception.

      We thank the reviewer for their positive comments.

      Weaknesses:

      "Nonetheless, it remains unknown how the firing rate of subcortical neurons changes when a stimulus is consciously perceived." (lines 76-77) The authors could be more specific about what exactly single-unit recordings offer for interrogating the role of subcortical regions in conscious perception that is unique from alternative neural activity recordings (e.g., local field potential) or recordings that are used as proxies of neural activity (e.g., fMRI).

      We agree with the reviewer that the contribution of micro-electrode recordings was not sufficiently put forward in our manuscript. We added the following sentences to the discussion, when discussing the multiple types of neurons we found:

      Single-unit recordings provide a much higher temporal resolution than functional imaging, which helps assess how the neural correlates of consciousness unfold over time. Contrary to local field potentials, single-unit recordings can expose the variety of functional roles of neurons within subcortical regions, thereby offering a potential for a better mechanistic understanding of perceptual consciousness.

      Related comment for the following excerpts:

      "After a random delay ranging from 0.5 to 1 s, a "respond" cue was played, prompting participants to verbally report whether they felt a vibration or not. Therefore, none of the reported analyses are confounded by motor responses." (lines 97-99).

      "These results show that subthalamic and thalamic neurons are modulated by stimulus onset, irrespective of whether it was reported or not, even though no immediate motor response was required." (lines 188190).

      "By imposing a delay between the end of the tactile stimulation window and the subjective report, we ensured that neuronal responses reflected stimulus detection and not mere motor responses." (lines 245247).

      It is a valuable feature of the paradigm that the reporting period was initiated hundreds of milliseconds after the stimulus presentation so that the neural responses should not represent "mere motor responses". However, verbal report of having perceived or not perceived a stimulus is a motor response and because the participants anticipate having to make these reports before the onset of the response period, there may be motor preparatory activity from the time of the perceived stimulus that is absent for the not perceived stimulus. The authors show sensitivity to this issue by identifying task-selective neurons and their discussion of the results that refer to the confound of post-perceptual processing. Still, direct treatment of this possible confound would help the rigor of the interpretation of the results.

      We agree with the reviewer that direct treatment would have provided the best control. One way to avoid motor preparation is to only provide the stimulus-effector mapping after the stimulus presentation (Bennur & Gold, 2011; Twomey et al., 2016; Fang et al., 2024). Other controls to avoid post-perceptual processing used in consciousness research consist of using no-report paradigms (Tsuchiya et al., 2015) as we did in previous studies (Pereira et al., 2021; Stockart et al., 2024). Unfortunately, neither of these procedures was feasible during the 10 minutes allotted for the research task in an intraoperative setting with auditory cues and vocal responses. We would like to highlight nonetheless that the effects we report are shortlived and incompatible with sustained motor preparation activity.

      We added the following sentence to the discussion:

      Future studies ruling out the presence of motor preparation triggered by perceived stimuli (Bennur & Gold, 2011; Fang et al., 2024; Twomey et al., 2016) and verifying that similar neuronal activity occurs in the absence of task-demands (no-reports; Tsuchiya et al., 2015) or attention (Wyart & Tallon-Baudry, 2008) will be useful to support that subcortical neurons contribute specifically to perceptual consciousness.

      "When analyzing tactile perception, we ensured that our results were not contaminated with spurious behavior (e.g. fluctuation of attention and arousal due to the surgical procedure)." (lines 118-117).

      Confidence in the results would be improved if the authors clarified exactly what behaviors were considered as contaminating the results (e.g., eye closure, saccades, and bodily movements) and how they were determined.

      This sentence was indeed unclear. It introduced the trial selection procedure we used to compensate for drifts in the perceptual threshold, which can result from fluctuations in attention or arousal. We modified the sentence, which now reads:

      When analyzing tactile perception, we ensured that our results were not contaminated by fluctuating attention and arousal due to the surgical procedure. Based on objective criteria, we excluded specific series of trials from analyses and focused on time windows for which hits and misses occurred in commensurate proportions (see methods).

      During the recordings, the experimenter stood next to the patients and monitored their bodily movements, ensuring they did not close their eyes or produce any other bodily movements synchronous with stimulus presentation.

      The authors' discussion of the thalamic neurons could be more precise. The authors show that only certain areas of the thalamus were recorded (in or near the ventral lateral nucleus, according to Figure S3C). The ventral lateral nucleus has a unique relationship to tactile and motor systems, so do the authors hypothesize these same perception-selective neurons would be active in the same way for visual, auditory, olfactory, and taste perception? Moreover, the authors minimally interpret the location of the task, sensory, and perception-responsive neurons. Figure S3 suggests these neurons are overlapping. Did the authors expect this overlap and what does it mean for the functional organization of the ventral lateral nucleus and subthalamic nucleus in conscious perception?

      These are excellent questions, the answers to which we can only speculate. In rodents, the LT is known as a hub for multisensory processing, as over 90% of LT neurons respond to at least two sensory modalities (for a review, see Yang et al., 2024). Yet, no study has compared how LT neurons in rodents encode perceived and nonperceived stimuli across modalities. Evidence in humans is scarce, with only a few studies documenting supramodal neural correlates of consciousness at the cortical level with noninvsasive methods (Noel et al., 2018; Sanchez et al., 2020; Filimonov et al., 2022). We now refer to these studies in the revised discussion: Moreover, given the prominent role of the thalamus in multisensory processing, it will be interesting to assess if it is specifically involved in tactile consciousness or if it has a supramodal contribution, akin to what is found in the cortex (Noel et al., 2018; Sanchez et al., 2020; Filimonov et al., 2022).

      Concerning the anatomical overlap of neurons, we could not reconstruct the exact locations of the DBS tracts for all participants. Because of the limited number of recorded neurons, we preferred to refrain from drawing strong conclusions about the functional organization of the ventral lateral nucleus.

      "We note that, 6 out of 8 neurons had higher firing rates for missed trials than hit trials, although this proportion was not significant (binomial test: p = 0.145)." (lines 215-216).

      It appears that in the three example neurons shown in Figure 4, 2 out of 3 (#001 and #068) show a change in firing rate predominantly for the missed stimulations. Meanwhile, #034 shows a clear hit response (although there is an early missed response - decreased firing rate - around 150 ms that is not statistically significant). This is a counterintuitive finding when compared to previous results from the thalamus (e.g., local field potentials and fMRI) that show the opposite response profile (i.e., missed/not perceived trials display no change or reduced response relative to hit/perceived trials). The discussion of the results should address this, including if these seemingly competing findings can be rectified.

      We thank the reviewer for pointing out this limitation of the discussion. We avoided putting too much emphasis on these aspects due to the limited number of perception-selective neurons. Although subcortical connectivity models would predict that neurons in the thalamus should increase their firing rate for perceived stimuli, we were not surprised to see this heterogeneity as we had previously found neurons decreasing their firing rates for missed stimuli in the posterior parietal cortex (Pereira et al., 2021). We answer these points in response to the reviewer’s last comment below on the latencies of the effects.

      The authors report 8 perception-responsive neurons, but there are only 5 recording sites highlighted (i.e., filled-in squares and circles) in Figures S3C and 4D. Was this an omission or were three neurons removed from the perception-responsive analysis?

      Unfortunately, we could not obtain anatomical images for all participants. This information was present in the methods section, although not clearly enough:

      For 34 / 50 neurons, preoperative MRI and postoperative CT scans (co-registered in patient native space using CranialSuite) were available to precisely reconstruct surgical trajectories and recording locations (for the remaining 16 neurons, localizations were based on neurosurgical planning and confirmed by electrophysiological recordings at various depths).

      Therefore, we added the following sentence in Figures 2, 3, 4 and S3.

      [...] for patients for which we could obtain anatomical images.

      Could the authors speak to the timing of the responses reported in Figure 4? The statistically significant intervals suggested both early (~160-200ms) to late responses (~300ms). Some have hypothesized that subcortical regions are early - ahead of cortical activation that may be linked with conscious perception. Do these results say anything about this temporal model for when subcortical regions are active in conscious perception?

      We agree that response timing could have been better described. We performed a new analysis of the latencies at which our main effects were observed. This analysis revealed the existence of the two clusters mentioned by the reviewer very clearly. We now include this analysis in a new Figure 5 in the revised manuscript.

      We also performed a new analysis to support the existence of bimodal distributions and quantified the latencies. We added this text to the result section:

      We note that the timings of sensory and perception effects in Figures 3 and 4 showed a bimodal distribution with an early cluster (149 ms for sensory neurons; 121 ms for perception neurons; c.f. methods) and a later cluster (330 ms for sensory neurons; 315 ms for perception neurons; Figure 5). and this section to the methods:

      To measure bimodal timings of effect latencies, we fitted a two-component Gaussian mixture distribution to the data in Figure 5 by minimizing the mean square error with an interior-point method. We took the best of 20 runs with random initialization points and verified that the resulting mean square error was markedly (> 4 times) better than using a single component.

      We updated the discussion, including the points made in the comment about higher activity for missed stimuli (above):

      The early cluster’s average timing around 150 ms post-stimulus corresponds to the onset of a putative cortical correlate of tactile consciousness, the somatosensory awareness negativity (Dembski et al., 2021). Similar electroencephalographic markers are found in the visual and auditory modality. It is unclear, however, whether these markers are related to perceptual consciousness or selective attention (Dembski et al., 2021). The later cluster is centered around 300 ms and could correspond to a well known electroencephalographic marker, the P3b (Polich, 2007) whose association with perceptual consciousness has been questioned (Pitts et al., 2014; Dembski et al., 2021) although brain activity related to consciousness has been observed at similar timing even in the absence of report demands (Sergent et al., 2021; Stockart et al., 2024). It is also important to note that these clusters contain neurons with both increased and decreased firing rates following stimulus onset, similar to what was observed previously in the posterior parietal cortex (Pereira et al., 2021).

      Reviewer #2 (Public Review):

      The authors have studied subpopulations of individual neurons recorded in the thalamus and subthalamic nucleus (STN) of awake humans performing a simple cognitive task. They have carefully designed their task structure to eliminate motor components that could confound their analyses in these subcortical structures, given that the data was recorded in patients with Parkinson's Disease (PD) and diagnosed with an Essential Tremor (ET). The recorded data represents a promising addition to the field. The analyses that the authors have applied can serve as a strong starting point for exploring the kinds of complex signals that can emerge within a single neuron's activity. Pereira et. al conclude that their results from single neurons indicate that task-related activity occurs, purportedly separate from previously identified sensory signals. These conclusions are a promising and novel perspective for how the field thinks about the emergence of decisions and sensory perception across the entire brain as a unit.

      We thank the reviewer for these positive comments.

      Despite the strength of the data that was obtained and the relevant nature of the conclusions that were drawn, there are certain limitations that must be taken into consideration:

      (1) The authors make several claims that their findings are direct representations of consciousnessidentifiable in subcortical structures. The current context for consciousness does not sufficiently define how the consciousness is related to the perceptual task.

      This is indeed a complex issue in all studies concerned with perceptual consciousness and we were careful not to make such “direct” claims. Instead, we used the state-of-the-art tools available to study consciousness (see below) and only interpreted our findings with respect to consciousness in the discussion. For example, in the abstract, our claim is that “Our results provide direct neurophysiological evidence of the involvement of the subthalamic nucleus and the thalamus for the detection of vibrotactile stimuli, thereby calling for a less cortico-centric view of the neural correlates of consciousness.”

      In brief, first, we used near-threshold stimuli which allowed us to contrast reported vs. unreported trials while keeping the physical properties of the stimulus comparable. Second, we used subjective reports without incentive for participants to be more conservative or liberal in their response (e.g. through reward). Third, we introduced a random delay before the responses to limit confounding effects due to the report. We also acknowledged that “... it will be important in future studies to examine if similar subcortical responses are obtained when stimuli are unattended (Wyart & Tallon-Baudry, 2008), task-irrelevant (Shafto & Pitts, 2015), or when participants passively experience stimuli without the instruction to report them (i.e., no-report paradigms) (Tsuchyia et al., 2015)”. This last sentence now reads (to address a point made by Reviewer 1 about motor preparation):

      Future studies ruling out the presence of motor preparation triggered by perceived stimuli (Bennur & Gold, 2011; Fang et al., 2024; Twomey et al., 2016) and verifying that similar neuronal activity occurs in the absence of task-demands (no-reports; Tsuchiya et al., 2015) or attention (Wyart & Tallon-Baudry, 2008) will be useful to support that subcortical neurons contribute specifically to perceptual consciousness.

      (2) The current work would benefit greatly from a description and clarification of what all the neurons thathave been recorded are doing. The authors' criteria for selecting subpopulations with task-relevant activity are appropriate, but understanding the heterogeneity in a population of single neurons is important for broader considerations that are being studied within the field.

      We followed the reviewer’s suggestions and added new results regarding the latencies of the reported effects (new Figure 5). We also now show firing rates for hits, misses and overall sensory activity (hits and misses combined) for all perception-selective or sensory-selective (when behavior was good enough; Figure S5). Although a more detailed characterization of the heterogeneity of the neurons identified would have been relevant, it seems beyond the scope of the present study, especially given the relatively small number of neurons we identified, as well as the relative simplicity of the paradigm imposed by the clinical context in which we worked.

      (3) The authors have omitted a proper set of controls for comparison against the active trials, forexample, where a response was not necessary. Please explain why this choice was made and what implications are necessary to consider.

      We had mentioned this limitation in the discussion: Nevertheless, it will be important in future studies to examine if similar subcortical responses are obtained when stimuli are unattended (Wyart & TallonBaudry, 2008), task-irrelevant (Shafto & Pitts, 2015), or when participants passively experience stimuli without the instruction to report them (i.e., no-report paradigms) (Tsuchyia et al., 2015). We agree that such a control would have been relevant, but this was not feasible during the 10 minutes allotted for the research task in an intraoperative setting. These constraints are both clinical, to minimize discomfort for patients and practical, as is difficult to track neurons in an intraoperative setting for more than 10 minutes.

      We added a sentence to this effect in the discussion.

      Reviewer #3 (Public Review):

      Summary:

      This important study relies on a rare dataset: intracranial recordings within the thalamus and the subthalamic nucleus in awake humans, while they were performing a tactile detection task. This procedure allowed the authors to identify a small but significant proportion of individual neurons, in both structures, whose activity correlated with the task (e.g. their firing rate changed following the audio cue signalling the start of a trial) and/or with the stimulus presentation (change in firing rate around 200 ms following tactile stimulation) and/or with participant's reported subjective perception of the stimulus (difference between hits and misses around 200 ms following tactile stimulation). Whereas most studies interested in the neural underpinnings of conscious perception focus on cortical areas, these results suggest that subcortical structures might also play a role in conscious perception, notably tactile detection.

      Strengths:

      There are two strongly valuable aspects in this study that make the evidence convincing and even compelling. First, these types of data are exceptional, the authors could have access to subcortical recordings in awake and behaving humans during surgery. Additionally, the methods are solid. The behavioral study meets the best standards of the domain, with a careful calibration of the stimulation levels (staircase) to maintain them around the detection threshold, and an additional selection of time intervals where the behavior was stable. The authors also checked that stimulus intensity was the same on average for hits and misses within these selected periods, which warrants that the effects of detection that are observed here are not confounded by stimulus intensity. The neural data analysis is also very sound and well-conducted. The statistical approach complies with current best practices, although I found that, in some instances, it was not entirely clear which type of permutations had been performed, and I would advocate for more clarity in these instances. Globally the figures are nice, clear, and well presented. I appreciated the fact that the precise anatomical location of the neurons was directly shown in each figure.

      We thank the reviewer for this positive evaluation.

      Weaknesses:

      Some clarification is needed for interpreting Figure 3, top rows: in my understanding the black curve is already the result of a subtraction between stimulus present trials and catch trials, to remove potential drifts; if so, it does not make sense to compare it with the firing rate recorded for catch trials.

      The black curve represents the firing rate without any subtraction. We only subtracted the firing rates of catch trials in the statistical procedure, as the reviewer noted, to remove potential drift. We added (before baseline correction) to the legend of Figure 3.

      I also think that the article could benefit from a more thorough presentation of the data and that this could help refine the interpretation which seems to be a bit incomplete in the current version. There are 8 stimulus-responsive neurons and 8 perception-selective neurons, with only one showing both effects, resulting in a total of 15 individual neurons being in either category or 13 neurons if we exclude those in which the behavior is not good enough for the hit versus miss analysis (Figure S4A). In my opinion, it should be feasible to show the data for all of them (either in a main figure, or at least in supplementary), but in the present version, we get to see the data for only 3 neurons for each analysis. This very small selection includes the only neuron that shows both effects (neuron #001; which is also cue selective), but this is not highlighted in the text. It would be interesting to see both the stimulus-response data and the hit versus miss data for all 13 neurons as it could help develop the interpretation of exactly how these neurons might be involved in stimulus processing and conscious perception. This should give rise to distinct interpretations for the three possible categories. Neurons that are stimulus-responsive but not perception-selective should show the same response for both hits and misses and hence carry out indifferently conscious and unconscious responses. The fact that some neurons show the opposite pattern is particularly intriguing and might give rise to a very specific interpretation: if the neuron really doesn't tend to respond to the stimulus when hits and misses are put together, it might be a neuron that does not directly respond to the stimulus, but whose spontaneous fluctuations across trials affect how the stimulus is perceived when they occur in a specific time window after the stimulus. Finally, neuron #001 responds with what looks like a real burst of evoked activity to stimulation and also shows a difference between hits and misses, but intriguingly, the response is strongest for misses. In the discussion, the interesting interpretation in terms of a specific gating of information by subcortical structures seems to apply well to this last example, but not necessarily to the other categories.

      We now provide a supplementary Figure showing firing rates for hits, misses and the combination of both. The reviewer’s analysis about whether a perception-selective neuron also has to respond to the stimulus to be involved in gating is interesting. With more data, a finer characterization of these neurons would have been possible. In our study, it is possible that more neurons have similar characteristics as #001 (e.g. #032, #062, #068) but do not show a significant difference with respect to baseline when both hits and misses are considered. We now avoid interpreting null effects, especially considering the low number of trials with near-threshold detection behavior we could collect in 10 minutes. 

      We also realized that we had not updated Figure S7 after the last revision in which we had corrected for possible drifts to obtain sensory-selective neurons. The corrected panel A is provided below.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      It appears that the correct rejection was low for most participants. It would improve interpretation of the behavioral results if correct rejection was shown as a rate (i.e., # of correct rejection trials / total number of no stimulus/blank trials) rather than or in addition to reporting the number of correct rejection trials (Figure 1C).

      We added the following figure to the supplementary information.

      The axis tick marks in Figure 5A late versus early are incorrect (appears the axis was duplicated).

      Thank you for spotting this, it has been corrected.

      Reviewer #2 (Recommendations For The Authors):

      We would like to congratulate the authors on this strongly supported contribution to the field. The manuscript is well-written, although a little bit too concise in sections. See the following comments for the methods that could benefit the present conclusions:

      Thank you for these suggestions that we believe improved our interpretations.

      Major Points

      (1) The subpopulations of neurons that are considered are small, but it is not a confounding issue for the conclusions drawn. However, the behavior of the neurons that were excluded should be considered by calculating the percentage of neurons that are selective for the distinct parameters, as a function of time. This would greatly strengthen the understanding of what can be observed in the two subcortical structures.

      We thank the reviewer for this suggestion. We performed a new analysis of the latencies at which our main effects were observed. This analysis revealed the existence of two clusters, as shown in the new Figure 5 copied below

      We also performed a new analysis to support the existence of bimodal distributions and quantified the latencies. We added this text to the result section:

      We note that the timings of sensory and perception effects in Figures 3 and 4 showed a bimodal distribution with an early cluster (149 ms for sensory neurons; 121 ms for perception neurons; c.f. methods) and a later cluster (330 ms for sensory neurons; 315 ms for perception neurons; Figure 5). and this section to the methods:

      To measure bimodal timings of effect latencies, we fitted a two-component Gaussian mixture distribution to the data in Figure 5 by minimizing the mean square error with an interior-point method. We took the best of 20 runs with random initialization points and verified that the resulting mean square error was markedly (> 4 times) better than using a single component.

      We also updated the discussion:

      The early cluster’s average timing around 150 ms post-stimulus corresponds to the onset of a putative cortical correlate of tactile consciousness, the somatosensory awareness negativity (Dembski et al., 2021). Similar electroencephalographic markers are found in the visual and auditory modality. It is unclear, however, whether these markers are related to perceptual consciousness or selective attention (Dembski et al., 2021). The later cluster is centered around 300 ms and could correspond to a well known electroencephalographic marker, the P3b (Polich, 2007) whose association with perceptual consciousness has been questioned (Pitts et al., 2014; Dembski et al., 2021) although brain activity related to consciousness has been observed at similar timing even in the absence of report demands (Sergent et al., 2021; Stockart et al., 2024). It is also important to note that these clusters contain neurons with both increased and decreased firing rates following stimulus onset, similar to what was observed previously in the posterior parietal cortex (Pereira et al., 2021).

      (2) We highly recommend that the authors consider employing some analysis that decodes therepresentations observable in the activity of individual neurons as a function of time (e.g. Shannon's Mutual Information). This would reinforce and emphasize the most relevant conclusions.

      We thank the reviewers for this suggestion. Unfortunately, such methods would require many more trials than what we were able to collect in the 10-minute slots available in the operating room.

      (3) Although there are small populations recorded in each of the two subcortical structures, they aresufficient to attempt a study using population dynamics (primarily, PCA can still work with smaller populations). Given the broad range of dynamics that are observed in a population of single units typically involved in decision-making, it would be interesting to consider whether heterogeneity is a hallmark of decision-making, and trying to summarize the variance in the activity of the entire population should provide a certain understanding of the cue-selective versus the perception-selective qualities, as an example.

      We now present all 13 neurons that were sensory- or perception-selective for which we had good enough behavior to show hit vs. miss differences in Supplementary Figure S5. Although population-level analyses would be relevant, they are not compatible with the number of neurons we identified.

      (4) A stronger presentation of what the expectations are for the results would also benefit theinterpretability of the manuscript when added to the introduction and discussion sections.

      Due to the scarcity of single-neuron data related to perceptual consciousness, especially in the subcortical structures we explored, our prior expectations did not exceed finding perception-selective neurons. We would prefer to avoid refining these expectations post-hoc. 

      Minor Comments

      (1) Add the shared overlap between differently selective neurons explicitly in the manuscript.

      We added this information at the end of the results section.

      (2) Add a consideration in the methods of why the Wilcoxon test or permutation test was selected forseparate uses. How do the results compare?

      Sorry for this misunderstanding. We clarified this in revised methods:

      To deal with possibly non-parametric distributions, we used Wilcoxon rank sum test or sign test instead of t-tests to test differences between distributions. We used permutation tests instead of Binomial tests to test whether a reported number of neurons could have been obtained by chance.

      Reviewer #3 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data or analysis:

      As suggested already in the public review, it might be worth showing all 13 neurons with either stimulusresponsive or perception-selective behaviour and, based on that, deepen the potential interpretation of the results for the different categories.

      We agree that this information improves the understanding of the underlying data and this addition was also proposed by reviewer 2. We added it in a new supplementary Figure S5.

      Recommendations for improving the writing and presentation

      As mentioned in the public review, I think Figure 3 needs clarification. I found that, in some instances, it was not entirely clear which type of analyses or permutation tests had been performed, and I would advocate for more clarity in these instances. For example:

      Page 6 line 146 "permuting trial labels 1000 times": do you mean randomly attributing a trial to aneuron? Or something else?

      We agree that this was somewhat unclear. We modified the sentence to:

      permuting the sign of the trial-wise differences

      We now define a sign permutation test for paired tests and a trial permutation test for two-sample tests in the methods and specify which test was used in the maintext.

      Page 7, neurons which have their firing rate modulated by the stimulus: I think you ought to be moreexplicit about the analysis so that we grasp it on the first read. To understand what is shown in Figure 3 I had to go back and forth between the main text and the method, and I am still not sure I completely understood. You compare the firing rate in sliding windows following stimulus onset with the mean firing rate during the 300ms baseline. Sliding windows are between 0 and 400 ms post-stim (according to methods ?) and a neuron is deemed responsive if you find at least one temporal cluster that shows a significant difference with baseline activity (using cluster permutation). Is that correct? Either way, I would recommend being a bit more precise about the analysis that was carried out in the main text, so that we only need to refer to methods when we need specialized information.

      We agree that the methods section was unclear. We re-wrote the following two paragraphs:

      To identify sensory-selective neurons, we assumed that subcortical signatures of stimulus detection ought to be found early following its onset and looked for differences in the firing rates during the first 400 ms post-stimulus onset compared to a 300 ms pre-stimulus baseline. To correct for possible drifts occurring during the trial, we subtracted the average cue-locked activity from catch trials to the cuelocked activity of each stimulus-present trials before realigning to stimulus onset. We defined a cluster as a set of adjacent time points for which the firing rates were significantly different between hits and misses, as assessed by a non-parametric sign rank test. A putative neuron was considered sensory-selective when the length of a cluster was above 80 ms, corresponding to twice the standard deviation of the smoothing kernel used to compute the firing rate. Whether for the shuffled data or the observed data, if more than one cluster was obtained, we discarded all but the longest cluster. This permutation test allowed us to control for multiple comparisons across time and participants.

      For perception-selective neurons, we looked for differences in the firing rates between hit and miss trials during the first 400 ms post-stimulus onset. We defined a cluster as a set of adjacent time points for which the firing rates were significantly different between hits and misses as assessed by a nonparametric Wilcoxon rank sum test. As for sensory-selective neurons, a putative neuron was considered perception-selective when the length of a cluster was above 80 ms, corresponding to twice the standard deviation of the smoothing kernel used to compute the firing rate and we discarded all but the longest cluster.

      Minor points:

      Figure 3: inset showing action potentials, please also provide the time scale (in the legend for example), so that it's clear that it is not commensurate with the firing rate curve below, but rather corresponds to the dots of the raster plot.

      We added the text ”[...], duration: 2.5 ms” in Figures 2, 3, and 4.

      Line 210: I recommend: “we found 8 neurons [...] showing a significant difference *between hits and misses* after stimulus onset."

      We made the change.

      Top of page 9, the following sentence is misleading “This result suggests that neurons in these two subcortical structures have mostly different functional roles ; this could read as meaning that functional roles are different between the two structures. Probably what you mean is rather something along this line : “these two subcortical structures both contain neurons displaying several different functional roles”

      Changed.

      Line 329: remove double “when”

      We made the change, thank you for spotting this.

    1. Author response:

      The following is the authors’ response to the previous reviews

      We would like to thank you for your valuable comments and suggestions, which have greatly contributed to improving our manuscript.

      We have carefully addressed all the reviewers' suggestions, and detailed responses for each Reviewer are provided at the end of this letter. In summary:

      • The Introduction has been revised to provide a more focused discussion on results, toning down the speculative discussion on seasonal host shifts.

      • The methodology section has been clarified, particularly the power analysis, which now includes a clearer explanation. The random effects in the models have been better described to ensure transparency.

      • The Results section was reorganized to highlight the key findings more effectively.

      • The Discussion has been restructured for clarity and conciseness, ensuring the interpretation of the results is clearer and better aligned with the study objectives.

      • Minor edits throughout the manuscript were made to improve readability and accuracy.

      We hope you find this revised version of the manuscript satisfactory.

      Reviewer #1 (Public review):

      Summary:

      This study examines the role of host blood meal source, temperature, and photoperiod on the reproductive traits of Cx. quinquefasciatus, an important vector of numerous pathogens of medical importance. The host use pattern of Cx. quinquefasciatus is interesting in that it feeds on birds during spring and shifts to feeding on mammals towards fall. Various hypotheses have been proposed to explain the seasonal shift in host use in this species but have provided limited evidence. This study examines whether the shifting of host classes from birds to mammals towards autumn offers any reproductive advantages to Cx.

      quinquefasciatus in terms of enhanced fecundity, fertility, and hatchability of the offspring. The authors found no evidence of this, suggesting that alternate mechanisms may drive the seasonal shift in host use in Cx. quinquefasciatus.

      Strengths:

      Host blood meal source, temperature, and photoperiod were all examined together.

      Weaknesses:

      The study was conducted in laboratory conditions with a local population of Cx. quinquefasciatus from Argentina. I'm not sure if there is any evidence for a seasonal shift in the host use pattern in Cx. quinquefasciatus populations from the southern latitudes.

      Comments on the revision:

      Overall, the manuscript is much improved. However, the introduction and parts of the discussion that talk about addressing the question of seasonal shift in host use pattern of Cx. quin are still way too strong and must be toned down. There is no strong evidence to show this host shift in Argentinian mosquito populations. Therefore, it is just misleading. I suggest removing all this and sticking to discussing only the effects of blood meal source and seasonality on the reproductive outcomes of Cx. quin.

      Introduction and discussion have been modified, toned down and sticked to discuss the results as suggested.

      Reviewer #1 (Recommendations for the authors):

      Some more minor comments are mentioned below.

      Line 51: Because 'of' this,

      Changed as suggested.

      Line 56: specialists 'or' generalists

      Changed as suggested.

      Line 56: primarily

      Changed as suggested.

      Line 98: Because 'of' this,

      Changed as suggested.

      Reviewer #2 (Public review):

      Summary:

      Conceptually, this study is interesting and is the first attempt to account for the potentially interactive effects of seasonality and blood source on mosquito fitness, which the authors frame as a possible explanation for previously observed hostswitching of Culex quinquefasciatus from birds to mammals in the fall. The authors hypothesize that if changes in fitness by blood source change between seasons, higher fitness on birds in the summer and on mammals in the autumn could drive observed host switching. To test this, the authors fed individuals from a colony of Cx. quinquefasciatus on chickens (bird model) and mice (mammal model) and subjected each of these two groups to two different environmental conditions reflecting the high and low temperatures and photoperiod experienced in summer and autumn in Córdoba, Argentina (aka seasonality). They measured fecundity, fertility, and hatchability over two gonotrophic cycles. The authors then used generalized linear mixed models to evaluate the impact of host species, seasonality, and gonotrophic cycle on fecundity, fertility, and hatchability. The authors were trying to test their hypothesis by determining whether there was an interactive effect of season and host species on mosquito fitness. This is an interesting hypothesis; if it had been supported, it would provide support for a new mechanism driving host switching. While the authors did report an interactive impact of seasonality and host species, the directionality of the effect was the opposite from that hypothesized. The authors have done a very good job of addressing many of the reviewer's concerns, especially by adding two additional replicates. Several minor concerns remain, especially regarding unclear statements in the discussion.

      Strengths:

      (1) Using a combination of laboratory feedings and incubators to simulate seasonal environmental conditions is a good, controlled way to assess the potentially interactive impact of host species and seasonality on the fitness of Culex quinquefasciatus in the lab.

      (2) The driving hypothesis is an interesting and creative way to think about a potential driver of host switching observed in the field.

      Weaknesses:

      (1) The methods would be improved by some additional details. For example, clarifying the number of generations for which mosquitoes were maintained in colony (which was changed from 20 to several) and whether replicates were conducted at different time points.

      Changed as suggested.

      (2) The statistical analysis requires some additional explanation. For example, you suggest that the power analysis was conducted a priori, but this was not mentioned in your first two drafts, so I wonder if it was actually conducted after the first replicate. It would be helpful to include further detail, such as how the parameters were estimated. Also, it would be helpful to clarify why replicate was included as a random effect for fecundity and fertility but as a fixed effect for hatchability. This might explain why there were no significant differences for hatchability given that you were estimating for more parameters.

      The power analysis was conducted a posteriori, as you correctly inferred. While I did not indicate that it was performed a priori, you are right in noting that this was not explicitly mentioned. As you suggested, the methodology for the power analysis has been revised to clarify any potential doubts.

      Regarding the model for hatchability, a model without a random effect variable was used, as all attempts to fit models with random effects resulted in poor validation. These points have now been clarified and explained in the corresponding section.

      (3) A number of statements in the discussion are not clear. For example, what do you mean by a mixed perspective in the first paragraph? Also, why is the expectation mentioned in the second paragraph different from the hypothesis you described in your introduction?

      Changed as suggested.

      (4) According to eLife policy, data must be made freely available (not just upon request).

      Data and code will be publicly available. The corresponding section was modified.

      Reviewer #2 (Recommendations for the authors):

      Your manuscript is much improved by the inclusion of two additional replicates! The results are much more robust when we can see that the trends that you report are replicable across 3 iterations of the experiment. Congratulations on a greatly improved study and paper! I have several minor concerns and suggestions, listed below:

      38-39: I think it is clearer to say "no statistically significant effect of season on hatchability of eggs" ... or specify if you are referring to blood or the interaction of blood and season. It isn't clear which treatment you are referring to here.

      Changed as suggested.

      54-57: This could be stated more succinctly. Instead of citing papers that deal with specific examples of patterns, I would suggest citing a review paper that defines these terms.

      Changed as suggested.

      83-84: What if another migratory bird is the preferred host in Argentina? I would state this more cautiously (e.g. "may not be applicable...").

      Changed as suggested.

      95-96: I don't understand what you mean by this. These hypotheses are specifically meant to understand mosquitoes that DO have a distinct seasonal phenology, so I'm not sure why this caveat is relevant. And naturally this hypothesis is host dependent, since it is based on specific host reproductive investments. I think that the strongest caveat to this hypothesis is simply that it hasn't been proven.

      Changed as suggested.

      97-115: This is a great paragraph! Very clear and compelling.

      Thanks for your words!

      118: Do you have an exact or estimated number of rafts collected?

      Sorry, I have not the exact number of rafts, but it was at leas more than 20-30.

      135: "over twenty" was changed to "several"; several would imply about 3 generations, so this is misleading. If the colony was actually maintained for over twenty generations, then you should keep that wording.

      Changed as suggested.

      163-164: Can you please clarify whether the replicates were conducted a separate time points?

      Changed as suggested.

      Note: the track changes did not capture all of the changes made; e.g. 163-164 should show as new text but does not.

      You are absolutely right; when I uploaded the last version, I unfortunately deleted all tracked changes and cannot recover them. In this new version, I will ensure that all minimal changes are included as tracked changes.

      186 - 189: the terms should be "fixed effect" and "random effect"

      Changed as suggested.

      191: Edit: linear

      Changed as suggested.

      194: why was replicate not included as a random effect here when it was above? Also, can you please clarify "interaction effects"? Which interactions did you include?

      Changed as suggested. Explained above and in methodology. Hatchability models with random effect variable were poor fitted and validated. The interactions for hatchability were a four-way (season, blood source, cycle and replicate)

      207-208: I'm not sure what you mean by "aimed to achieve"? Weren't you doing this after you conducted the experiments, so wouldn't this be determining the power of your model (post-hoc power analysis)? Also, I think you should provide the parameter estimates that were used (e.g. effect size - did you use the effect size you estimated across the 3 replicates?).

      Changed as suggested.

      214-215: this should be reworded to acknowledge that this is estimated for the given effect size; for example, something like "This sample size was sufficient to detect the observed effect with a statistical power of 0.8" or something along those lines (unless I am misunderstanding how you conducted this test).

      Changed as suggested.

      246. Abbreviate Culex

      Changed as suggested.

      253-255: This sentence isn't clear. What do you mean by mixed? Also, the season really seemed to mainly impact the fitness of mosquitoes fed on mouse blood and here the way it is phrased seems to indicate that season has an impact on the fitness of those fed with chicken blood.

      Changed as suggested.

      258-260: You stated your hypothesis as the relative fitness shifting between seasons, but this statement about the expectation is different from your hypothesis stated earlier. Please clarify.

      You are right. Thank you for noting this. It was changed as suggested.  

      263-266: I also don't understand this sentence; what does the first half of the sentence have to do with the second?

      Changed as suggested.

      269-270: This doesn't align with your observation exactly; you say first AND second are generally most productive, but you observed a drop in the second. Please clarify this.

      Changed as suggested.

      280: I suggest removing "as same as other studies"; your caveats are distinct because your experimental design was unique

      Changed as suggested.

      287: you shouldn't be looking for a "desired" effect; I suggest removing this word

      Changed as suggested.

      288: It wasn't really a priori though, since you conducted it after your first replicate (unless you didn't use the results from the first replicate you reported in the original drafts?)

      It was a posteriori. Changed as suggested.

      290: Why is 290 written here?

      It was a mistype. Deleted as suggested.

      291-298: The meaning of this section of your paragraph is not clear.

      Improve as suggested.

      304-313: This list of 3 explanations are directed at different underlying questions. Explanations 1 and 2 are alternative explanations for why host switching occurs if not due to differences in fitness. This isn't really an explanation of your results so much as alternative explanations for a previously reported phenomenon. And the third is an explanation for why you may not have observed the expected effect. I suggest restructuring this to include the fact that Argentinian quinqs may not host switch as part of your previous list of caveats. Then you can include your two alternative explanations for host switching as a possible future direction (although I would say that it is really just one explanation because "vector biology" is too broad of a statement to be testable). Also, you haven't discussed possible explanations for your actual result, which showed that mosquito fitness decreased when feeding on mouse blood in autumn conditions and in the second gonotrophic, while those that fed on chicken did not experience these changes. Why might that be?

      The discussion was restructured to include all these suggested changes. Additionally, it was also discussed some possible explanations of our results.

      315-317: This statement is vague without a direct explanation of how this will provide insight. I suggest removing or providing an explanation of how this provides insight to transmission and forecasting.

      Changed as suggested.

      319-320: According to eLife policy, all data should be publicly available. From guidelines: "Media Policy FAQs Data Availability Purpose and General Principles To maintain high standards of research reproducibility, and to promote the reuse of new findings, eLife requires all data associated with an article to be made freely and widely available. These must be in the most useful formats and according to the relevant reporting standards, unless there are compelling legal or ethical reasons to restrict access. The provision of data should comply with FAIR principles (Findable, Accessible, Interoperable, Reusable). Specifically, authors must make all original data used to support the claims of the paper, or that is required to reproduce them, available in the manuscript text, tables, figures or supplementary materials, or at a trusted digital repository (the latter is recommended). This must include all variables, treatment conditions, and observations described in the manuscript. The authors must also provide a full account of the materials and procedures used to collect, pre-process, clean, generate and analyze the data that would enable it to be independently reproduced by other researchers."

      - so you need to make your data available online; I also understand the last sentence to indicate that code should be made available.  

      Data and code will be publicly available.

      Table 1: it is notable that in replicate 2, the autumn:mouse:gonotrophic cycle II fecundity and fertility are actually higher than in the summer, which is the opposite of reps 1 and 3 and the overall effect you reported from the model. This might be worth mentioning in the discussion.

      Mentioned in the discussion as suggested.

      Tables 1 and 2: shouldn't this just be 8 treatments? You included replicate as a random effect, so it isn't really a separate set of treatments.

      This table reflects the output of the whole experiment, that is why it is present the 24 expetiments.

      Figure 3: Can you please clarify if this is showing raw data?

      Changed as suggested.

      Note: grammatical copy editing would be beneficial throughout

      Grammar was improved as suggested.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      In this study, Tian et al. explore the role of ubiquitination of non-structural protein 16 (nsp16) in the SARS-CoV-2 life cycle. nsp16, in conjunction with nsp10, performs the final step of viral mRNA capping through its 2'-O-methylase activity. This modification allows the virus to evade host immune responses and protects its mRNA from degradation. The authors demonstrate that nsp16 undergoes ubiquitination and subsequent degradation by the host E3 ubiquitin ligases UBR5 and MARCHF7 via the ubiquitin-proteasome system (UPS). Specifically, UBR5 and MARCHF7 mediate nsp16 degradation through K48- and K27-linked ubiquitination, respectively. Notably, degradation of nsp16 by either UBR5 or MARCHF7 operates independently, with both mechanisms effectively inhibiting SARS-CoV-2 replication in vitro and in vivo. Furthermore, UBR5 and MARCHF7 exhibit broad-spectrum antiviral activity by targeting nsp16 variants from various SARS-CoV-2 strains. This research advances our understanding of how nsp16 ubiquitination impacts viral replication and highlights potential targets for developing broadly effective antiviral therapies.

      Strengths:

      The proposed study is of significant interest to the virology community because it aims to elucidate the biological role of ubiquitination in coronavirus proteins and its impact on the viral life cycle. Understanding these mechanisms will address broadly applicable questions about coronavirus biology and enhance our overall knowledge of ubiquitination's diverse functions in cell biology. Employing in vivo studies is a strength.

      Weaknesses:

      Minor comments:

      Figure 5A- The authors should ensure that the figure is properly labeled to clearly distinguish between the IP (Immunoprecipitation) panel and the input panel.

      Thank you for your suggestion. We have exchanged Figure 5 in this version.

      Reviewer #3 (Public review):

      Summary:

      The manuscript "SARS-CoV-2 nsp16 is regulated by host E3 ubiquitin ligases, UBR5 and MARCHF7" is an interesting work by Tian et al. describing the degradation/ stability of NSP16 of SARS CoV2 via K48 and K27-linked Ubiquitination and proteasomal degradation. The authors have demonstrated that UBR5 and MARCHF7, an E3 ubiquitin ligase bring about the ubiquitination of NSP16. The concept, and experimental approach to prove the hypothesis looks ok. The in vivo data looks ok with the controls. Overall, the manuscript is good.

      Strengths:

      The study identified important E3 ligases (MARCHF7 and UBR5) that can ubiquitinate NSP16, an important viral factor.

      Comments on revisions:

      I had gone through the revised form of the manuscript thoroughly. The authors have addressed all of my concerns. To me, the experimental approach looks convincing that the host E3 ubiquitin ligases (UBR5 and MARCHF7) ubiquitinate NSP16 and mark it for proteasomal degradation via K48- and K27- linkage. The authors have represented the final figure (Fig.8) in a convincing manner, opening a new window to explore the mechanism of capping the vRNA bu NSP16.

      Thank you for your recognition.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary, and Strengths:

      The authors and their team have investigated the role of Vimentin Cysteine 328 in epithelial-mesenchymal transition (EMT) and tumorigenesis. Vimentin is a type III intermediate filament, and cysteine 328 is a crucial site for interactions between vimentin and actin. These interactions can significantly influence cell movement, proliferation, and invasion. The team has specifically examined how Vimentin Cysteine 328 affects cancer cell proliferation, the acquisition of stemness markers, and the upregulation of the non-coding RNA XIST. Additionally, functional assays were conducted using both wild-type (WT) and Vimentin Cysteine 328 mutant cells to demonstrate their effects on invasion, EMT, and cancer progression. Overall, the data supports the essential role of Vimentin Cysteine 328 in regulating EMT, cancer stemness, and tumor progression. Overall, the data and its interpretation are on point and support the hypothesis. I believe the manuscript has great potential.

      The authors are thankful to the reviewers for carefully reading the manuscript and evaluating the data to make positive comments and supporting our conclusions.

      Weaknesses:

      Minor issues are related to the visibility and data representation in Figures 2E and 3 A-F

      We have revised the figures (Figure 2E and Figure 3A-F) to increase the data visibility.

      Reviewer #2 (Public review):

      The aim of the investigation was to find out more about the mechanism(s) by which the structural protein vimentin can facilitate the epithelial-mesenchymal transition in breast cancer cells.

      The authors focussed on a key amino acid of vimentin, C238, its role in the interaction between vimentin and actin microfilaments, and the downstream molecular and cellular consequences. They model the binding between vimentin and actin in silico to demonstrate the potential involvement of C238, but the outcome is described vaguely.

      We have expanded the discussion of these results in the manuscript to more explicitly describe the critical role of C238 in the vimentin-actin interaction. Specifically, we highlight that C238 lies within a region of the vimentin rod domain known to mediate key protein-protein interactions. Our modeling shows that the thiol group of C238 enables specific hydrogen bonding and potential disulfide-mediated interactions with actin, which are disrupted upon mutation to serine. These findings provide mechanistic insight into the functional importance of this residue.

      The phenotype of a non-metastatic breast cancer cell line MCF7, which doesn't express vimentin, could be changed to a metastatic phenotype when mutant C238S vimentin, but not wild-type vimentin, was expressed in the cells. Expression of vimentin was confirmed at the level of mRNA, protein, and microscopically. Patterns of expression of vimentin and actin reflected the distinct morphology of the two cell lines. Phenotypic changes were assessed through assay of cell adhesion, proliferation, migration, and morphology and were consistent with greater metastatic potential in the C238S MCF7 cells. Changes in the transcriptome of MCF7 cells expressing wild-type and C238S vimentins were compared and expression of Xist long ncRNA was found to be the transcript most markedly increased in the metastatic cells expressing C238S vimentin. Moreover changes in expression of many other genes in the C238S cells are consistent with an epithelial mesenchymal transition. Tumourigenic potential of MCF7 cells carrying C238S but not wild-type, vimentin was confirmed by inoculation of cells into nude mice. This assay is a measure of the stem-cell quality of the cells and not a measure of metastasis. It does demonstrate phenotypic changes that could be linked to metastasis.

      shRNA was used to down-regulate vimentin or Xist in the MCF7 C238S cells. The description of the data is limited in parts and data sets require careful scrutiny to understand the full picture. Down-regulation of vimentin reversed the morphological changes to some degree, but down-regulation of Xist didn't.

      This is understandable given the fact that vimentin interacts with actin which is known to determine cell shape. XIST being a non-coding RNA will not have the same effect.

      Conversely, down-regulation of XIST inhibited cell growth, a sign of reversing metastatic potential, but down-regulation of vimentin had no effect on growth.

      XIST is known to get induced in a number of cancers (see Figure 3E) which is consistent with our observation that its downregulation will inhibit cell growth. However, downregulation of vimentin had no effect on growth which is consistent with our previously published observation that ectopic expression of wildtype vimentin in MCF-7 cells did not influence cell growth (Usman et al Cells 2022, 11(24), 4035; https://doi.org/10.3390/cells11244035).

      Down-regulation of either did inhibit cell migration, another sign of metastatic reversal.

      We have previously shown that ectopic expression of wildtype vimentin in MCF-7 stimulate cell migration due to downregulation of CDH5 (endothelial cadherins) (Usman et al Cells 2022, 11(24), 4035). Therefore, downregulation of vimentin is expected to inhibit cell migration which is what we observed in this study. Why downregulation of XIST inhibited cell migration is not clear. It is conceivable that XIST downregulation affects Lamin expression which may suppress intercellular interactions to increase cell migration. This hypothesis is supported by the fact that vimentin expression in MCF-7 affects Lamin expression (Usman et al Cells 2022, 11(24), 4035).

      The interpretation of this type of experiment is handicapped when full reversal of expression is not achieved, as was the case in this study.

      Full reversal of any biological effect is almost impossible to achieve which is because the shRNAs by nature are not 100% effective. This can however be tested using crispr Cas 9 gene editing to completely knockdown a protein (can’t be used for XIST as it is a non-coding RNA). In that case one has to assume that it will have no off-target effect.

      Overall the study describes an intriguing model of metastasis that is worthy of further investigation, especially at the molecular level to unravel the connection between vimentin and metastasis. The identification of a potential role for Xist in metastasis, beyond its normal role in female cells to inactivate one of the X chromosomes, corroborates the work of others demonstrating increased levels in a variety of tumours in women and even in some tumours in men. It would be of great interest to see where in metastatic cells Xist is expressed and what it binds to.

      The authors fully agree that it is an interesting model of metastasis/oncogenesis that requires further investigation.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Hua et al show how targeting amino acid metabolism can overcome Trastuzumab resistance in HER2+ breast cancer.

      Strengths:

      The authors used metabolomics, transcriptomics and epigenomics approaches in vitro and in preclinical models to demonstrate how trastuzumab-resistant cells utilize cysteine metabolism.

      Thank you for your valuable comments. We would like to extend our appreciation for your efforts. Your constructive suggestion would help improve our research.

      Weaknesses:

      However, there are some key aspects that needs to be addressed.

      Major:

      (1) Patient Samples for Transcriptomic Analysis: It is unclear from the text whether tumor tissues or blood samples were used for the transcriptomic analysis. This distinction is crucial, as these two sample types would yield vastly different inferences. The authors should clarify the source of these samples.

      Thank you for your valuable comments. In the transcriptomic analysis, we included the data of HER2 positive breast cancer patients who received trastuzumab in I-SPY2 trial (GSE181574). Tumor tissues were used in this dataset. We highlighted the usage of “pre-treatment breast cancer tumors” in Line 309 and included the overview of transcriptomic data analysis in I-SPY2 trial in Figure S1F.

      (2) The study only tested one trastuzumab-resistant and one trastuzumab-sensitive cell line. It is unclear whether these findings are applicable to other HER2-positive tumor cell lines, such as HCC1954. The authors should validate their results in additional cell lines to strengthen their conclusions.

      Thank you for your valuable comments. We agree with your opinion, and the exploration of multiple cell lines would make our research findings more comprehensive. This is a limitation of our study, and we would continue to improve our design and methods in future experiments.

      (3) Relevance to Metastatic Disease: Trastuzumab resistance often arises in patients during disease recurrence, which is frequently associated with metastasis. However, the mouse experiments described in this paper were conducted only in the primary tumors. This article would have more impact if the authors could demonstrate that the combination of Erastin or cysteine starvation with trastuzumab can also improve outcomes in metastasis models.

      Thank you for your valuable comments. We agree with your suggestions. The exploration of metastatic disease would make our research more meaningful and help better address clinical key issues. In our future studies, we will continue to investigate the association between the invasive and metastatic capabilities of trastuzumab resistant HER2 positive breast cancer and cysteine metabolism.

      Minor:

      (1) The figures lack information about the specific statistical tests used. Including this information is essential to show the robustness of the results.

      Thank you for your valuable comments. We added statistical information in our figure legends, including Line 849-850, Line 865-867, Line 881-882, Line 898-900, Line 910-911 and Line 923-924.

      (2) Figure 3K Interpretation: The significance asterisks in Figure 3K do not specify the comparison being made. Are they relative to the DMSO control? This should be clarified.

      Thank you for your valuable comments. We have modified this figure to demonstrate it more clearly. In Figure 3K, the significance was determined by one-way ANOVA and the comparison presented was relative to the DMSO control. It was indicated that the combination of erastin or cysteine starvation and trastuzumab could increase lipid peroxidation, although trastuzumab monotherapy did not induce ferroptosis.

      Additionally, the combination of erastin and trastuzumab could result in more lipid peroxidation than erastin alone. Similar results were also found in the combination of cysteine starvation and trastuzumab. These results showed that targeting cysteine metabolism plus trastuzumab could have synergic effects to induce ferroptosis in trastuzumab resistant HER2 positive breast cancer.

      Reviewer #2 (Public review):

      In this manuscript, Hua et al. proposed SLC7A11, a protein facilitating cellular cystine uptake, as a potential target for the treatment of trastuzumab-resistant HER2-positive breast cancer. If this claim holds true, the finding would be of significance and might be translated to clinical practice. Nevertheless, this reviewer finds that the conclusion was poorly supported by the data.

      Notably, most of the data (Figures 2-6) were based on two cell lines - JIMT1 as a representative of trastuzumab-resistant cell line, and SKBR3 as a representative of trastuzumab sensitive cell line. As such, these findings could be cell-line specific while irrelevant to trastuzumab sensitivity at all. Furthermore, the authors claimed ferroptosis simply based on lipid peroxidation (Figure 3). Cell viability was not determined, and the rescuing effects of ferroptosis inhibitors were missing. The xenograft experiments were also suspicious (Figure 4). The description of how cysteine starvation was performed on xenograft tumors was lacking, and the compound (i.e., erastin) used by the authors is not suitable for in vivo experiments due to low solubility and low metabolic stability. Finally, it is confusing why the authors focused on epigenetic regulations (Figures 5 & 6), without measuring major transcription factors (e.g., NRF2, ATF4) which are known to regulate SLC7A11.

      To sum up, this reviewer finds that the most valuable data in this manuscript is perhaps Figure 1, which provides unbiased information concerning the metabolic patterns in trastuzumab-sensitive and primary resistant HER2-positive breast cancer patients.

      Thank you for your valuable comments. We agree with your suggestions. Your feedback would help enhance the quality of our research.

      (1) Our research was mainly conducted in JIMT1 (trastuzumab resistant) and SKBR3 (trastuzumab sensitive), and this is a limitation of our study. The experimental validation using different cell lines will make our research findings more persuasive. In our future research, we will continuously optimize experimental design and methods to make our findings more comprehensive.

      (2) The detection of ferroptosis in our research was mainly performed by evaluating the lipid peroxidation. Experiments measuring cell viability and rescuing effects would help provide more evidence.

      We utilized CCK8 tests to compare cell viabilities of JIMT1 and SKBR3 in different erastin and RSL3 concentrations, as well as different exposure time of cysteine starvation. It was shown that JIMT1 was more sensitive to erastin and RSL3, but tolerant to cysteine starvation, which was consistent with the previous lipid peroxidation tests. This data was included in Figure S5C-E. We added the description in Line 375-379.

      In addition, we also performed experiments to explore the rescuing effects of ferroptosis inhibitor Fer-1. It was indicated that Fer-1 could suppress the lipid peroxidation resulted from erastin, RSL3 and cysteine starvation in both JIMT1 and SKBR3. This provided more evidence that cysteine metabolism played a vital role in modulating HER2 positive breast cancer ferroptosis. This data was included in Figure S5G and S5H. We added the description to Line 387-391.

      (3) In xenograft experiments, the cysteine starvation was performed by feeding cystine/cysteine-deficient diet (Xietong Bio). We added details of this diet on Line 236-237 in Methods.

      We agree with your opinion on the role of erastin in experiments in vivo. We have tried to optimize drug dissolution and other conditions by referring to previous relevant literature. We would continue to improve our experimental design and methods.

      (4) Epigenetic modifications have been recognized as crucial factors in drug resistance formation. An increasing number of studies have emphasized the importance of epigenetic changes in regulating the abnormal expression of oncogenes and tumor suppressor genes related to drug resistance. Currently, the role of epigenetic changes in the development of trastuzumab resistance in HER2 positive breast cancer is still in exploration. We tried to investigate the dysregulation of histone modifications and DNA methylation in trastuzumab resistant HER2 positive breast cancer. Our findings indicated that targeting H3K4me3 and DNA methylation could decrease SLC7A11 expression and induce ferroptosis. This would provide more evidence in exploring trastuzumab resistance mechanisms. We have provided a detailed discussion on Line 598-607.

      We would like to extend our appreciation for your constructive suggestions and continue to improve our research in future experiments.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) Line 334: it would be helpful to clarify that JIMT1 cells are trastuzumab-resistant while SKBR3 cells are trastuzumab sensitive, especially for those not familiar with breast cancer cell lines.

      Thank you for your valuable recommendations. We added the description of trastuzumab sensitive SKBR3 and trastuzumab resistant JIMT1 on Line 334-335.

      (2) Figure 3: the concentrations of erastin and RSL3 should be indicated.

      Thank you for your valuable recommendations. In Figure 3, the concentration of erastin was 10μm and RSL3 was 1μm. We added these details in the figure legends on Line 872-873.

      (3) Figure 3: lipid peroxidation does not necessarily mean ferroptosis. Cell viability data and rescuing effects of ferroptosis inhibitors should be shown.

      Thank you for your valuable recommendations. As we mentioned above, we utilized CCK8 tests to compare cell viabilities of JIMT1 and SKBR3 in different erastin and RSL3 concentrations, as well as different exposure time of cysteine starvation. It was consistent with lipid peroxidation tests that JIMT1 was more sensitive to erastin and RSL3, but tolerant to cysteine starvation. This data was included in Figure S5C-E. We added the description in Line 375-379.

      As described above, we also performed experiments to explore the rescuing effects of ferroptosis inhibitor Fer-1. It was indicated that Fer-1 could suppress the lipid peroxidation resulted from erastin, RSL3 and cysteine starvation in both JIMT1 and SKBR3. This provided more evidence that cysteine metabolism played a vital role in modulating HER2 positive breast cancer ferroptosis. This data was included in Figure S5G and S5H. We added the description to Line 387-391.

      (4) Figure 3H: how cysteine starvation was performed should be clarified in the Methods section.

      Thank you for your valuable recommendations. We performed cell culture with cysteine starvation by utilizing cystine/cysteine-deficient DMEM (BIOTREE) and 1% penicillin streptomycin at 37℃ with 5% CO2. We added details of this diet on Line 141-143 in Methods.

      (5) Figure 4: the meaning of "H" should be clarified.

      Thank you for your valuable recommendations. H was indicated as trastuzumab. We clarified the meaning of “H” in the figure legends on Line 898.

      (6) Figure 4B & 4C: the data of "H" group and "Erastin" group are inconsistent.

      Thank you for your valuable recommendations. In the vivo experiments, the tumor volume changes were analyzed using a paired approach, comparing the tumor size of each individual mouse before and after treatment. We noticed the confusion caused and added more details about our vivo experiments on Line 240 in Methods and Line 892-893 in figure legends.

      (7) Figure 4: how cysteine starvation was performed should be clarified in the Methods section.

      Thank you for your valuable recommendations. We performed cysteine starvation by utilizing cystine/cysteine-deficient diet (Xietong Bio). We added details of this diet on Line 236-237 in Methods.

      We have also corrected some grammatical errors in the manuscript and We would like to extend our great appreciation to all editors and reviewers for their invaluable contributions.

    1. Author response:

      The following is the authors’ response to the original reviews

      Summary of revisions:

      Thanks to the careful review and comments from the reviewers, we restructured the introduction and the discussion to improve clarity and better contextualise findings. We notably discuss further the f<sub>sphere</sub> decrease observations in the cerebellum and the Tau-specific findings (Tau being a possible marker for Purkinje cells development and Tau switching compartment in the thalamus). We added material in Supplementary Information to support these discussion points. We added a figure to show the metabolic profiles normalised by water or by macromolecules and a figure and table related to a rough approximation of f<sub>sphere</sub>, leaning on existing literature. We report the DTI results for thoroughness.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this work, Ligneul and coauthors implemented diffusion-weighted MRS in young rats to follow longitudinally and in vivo the microstructural changes occurring during brain development. Diffusion-weighted MRS is here instrumental in assessing microstructure in a cell-specific manner, as opposed to the claimed gold-standard (manganese-enhanced MRI) that can only probe changes in brain volume. Differential microstructure and complexification of the cerebellum and the thalamus during rat brain development were observed noninvasively. In particular, lower metabolite ADC with increasing age were measured in both brain regions, reflecting increasing cellular restriction with brain maturation. Higher sphere (representing cell bodies) fraction for neuronal metabolites (total NAA, glutamate) and total creatine and taurine in the cerebellum compared to the thalamus were estimated, reflecting the unique structure of the cerebellar granular layer with a high density of cell bodies. Decreasing sphere fraction with age was observed in the cerebellum, reflecting the development of the dendritic tree of Purkinje cells and Bergmann glia. From morphometric analyses, the authors could probe non-monotonic branching evolution in the cerebellum, matching 3D representations of Purkinje cells expansion and complexification with age. Finally, the authors highlighted taurine as a potential new marker of cerebellar development.

      From a technical standpoint, this work clearly demonstrates the potential of diffusion-weighted MRS at probing microstructure changes of the developing brain non-invasively, paving the way for its application in pathological cases. Ligneul and coauthors also show that diffusionweighted MRS acquisitions in neonates are feasible, despite the known technical challenges of such measurements, even in adult rats. They also provide all necessary resources to reproduce and build upon their work, which is highly valuable for the community.

      From a biological standpoint, claims are well supported by the microstructure parameters derived from advanced biophysical modelling of the diffusion MRS data. The assumption of metabolite compartmentation, forming the basis of cell-specific microstructure interpretation of dMRS data, remains debated and should be considered with care (Rae, Neurochem Res, 2014, https://doi.org/10.1007/s11064-013-1199-5). External cross-validation of some of the authors' claims, in particular taurine in the thalamus switching from neurons to astrocytes during brain development, would be a highly valuable addition to this study.

      R1.1: We understand the reviewer's concerns. Metabolic compartmentation is not a one-toone correspondence. Although we interpret the results in the light of metabolic compartmentation, our results are not driven by this assumption. We could not perform a direct cross-validation of the taurine switch in the thalamus, but we now clarify in the discussion why the dMRS results themselves indicate a switch, and we integrate our results better with existing literature on taurine. We now discuss this in more detail for the cerebellar results too.

      Specific strengths:

      (1) The interpretation of dMRS data in terms of cell-specific microstructure through advanced biophysical modelling (e.g. the sphere fraction, modelling the fraction of cell bodies versus neuronal or astrocytic processes) is a strong asset of the study, going beyond the more commonly used signal representation metrics such as the apparent diffusion coefficient, which lacks specificity to biological phenomena.

      (2) The fairly good data quality despite the complexity of the experimental framework should be praised: diffusion-weighted MRS was acquired in two brain regions (although not in the same animals) and longitudinally, in neonates, including data at high b-values and multiple diffusion times, which altogether constitutes a large-scale dataset of high value for the diffusion-weighted MRS community.

      (3) The authors have shared publicly data and codes used for processing and fitting, which will allow one to reproduce or extend the scope of this work to disease populations, and which goes in line with the current effort of the MR(S) community for data sharing.

      Specific weaknesses:

      (1) This work lacks an introduction and a discussion about diffusion MRI, which is already a validated technique to assess brain development non-invasively. Although water lacks cellspecificity compared to metabolites, several studies have reported a decrease in water ADC and increased fractional anisotropy with brain maturation, associated with the myelination process and decreased water content (overview in Hüppi, Chapt. 30 of "Diffusion MRI: Theory, Methods, and Applications", Oxford University Press, 2010). Interestingly, the same observations are found in this work (decreased ADC with age for most metabolites in both brain regions), which should have been commented on. Moreover, the authors could have reported water diffusion properties in addition to metabolites', as I believe the water signal, used for coil combination and/or Eddy currents corrections, is usually naturally acquired during diffusion-weighted MRS scans.

      R1.2: Thank you for these helpful suggestions. We have now improved our introduction of the various modalities, and we contextualise the study in light of previous DTI findings in the as suggested by the reviewer. We agree with the reviewer that the comparison with previous human DTI is relevant, and we now mention it at the beginning of the discussion. However, the very different nature of the dMRS signal compared to dMRI (intracellular and absence of exchange for metabolites) prevents us from drawing any strong conclusions.

      (2) It is unclear why the authors have normalized metabolite concentrations (measured from low b-values diffusion-weighted MRS spectra) to the macromolecule concentrations. First, it is not specified whether in vivo macromolecules were acquired at each age or just at one time point. Second, such ratios are not standard practice in the MRS community so this choice should have been explained. Third, the macromolecule content was reported to change with age (Tkac et al., Magn Reson Med, 2003), therefore a change in metabolite to macromolecule ratio with age cannot be interpreted unequivocally.

      R1.3: We agree with the reviewer that this needed further explanations. We now clarify in the Results section “Metabolic profile changes with age” the reasoning behind choosing macromolecules for normalisation. We also added in the Supplementary Information the metabolite concentrations change with age when normalising by water, and a direct comparison with MM normalisation (Figure S2).

      (3) Some discussion is missing about the choice of the analytical biophysical model (although a few are compared in Supplementary Materials), in particular: is a model of macroscopic anisotropy relevant in cerebellum, made of a large fraction of oriented white matter tracks, and does the model remain valid at different ages given white matter maturation and the ongoing myelination process?

      R1.4: We agree with the reviewer that this is a valid concern. We actually acquired some standard DTI at the end of the acquisition sessions (where possible) having in mind the fibre dispersion estimation. However, data could not be acquired in all animals, and the data quality was poor (see Figure S8, the experimental conditions would have required further optimisation). We now add a couple of sentences at the beginning and in the end of discussion to address this limitation, and we include the DTI data in Supplementary Information.

      Reviewer #2 (Public Review):

      Summary:

      The authors set out to non-invasively track neuronal development in rat neonates, which they achieved with notable success. However, the direct relationship between the results and broader conclusions regarding developmental biology and potential human implications is somewhat overstretched without further validation.

      Strengths:

      If adequately revised and validated, this work could have a significant impact on the field, providing a non-invasive tool for longitudinal studies of brain development and neurodevelopmental disorders in preclinical settings.

      Weaknesses:

      (1) Consistency and Logical Flow:

      The manuscript suffers from a lack of strategic flow in some sections. Specifically, transitions between major findings and methodological discussions need refinement to ensure a logical progression of ideas. For example, the jump from the introduction of developmental trajectories and the technicalities of MRS (Magnetic Resonance Spectroscopy) processing on page 3 could benefit from a bridging paragraph that explicitly states the study's hypotheses based on existing literature gaps.

      R2.1: Thank you for this general feedback (along with your point (3)) that helped us restructure the introduction and the discussion to improve the clarity and flow.

      (2)  Scientific Rigour:

      While the novel application of diffusion-weighted MRS is commendable, there's a notable gap in the rigorous validation of this approach against gold-standard histological or molecular techniques. Particularly, the assertions regarding the sphere fraction and morphological changes inferred from biophysical modelling mandates direct validation to solidify the claims made. A study comparing these in vivo findings with ex vivo confirmation in at least a subset of samples would significantly enhance the reliability of these conclusions.

      R2.2: We agree with the reviewer that this would have been a great addition to the manuscript. Although we could not run new experiments to address these flaws, we now discuss the results more quantitatively, leaning on existing literature (addition of Figure S11 and Table S2). This helps us understand the results around Tau in both regions better, and illustrate the R<sub>sphere</sub> trend.

      (3) Clarity and Novelty:

      - The manuscript often delves deeply into technical specifics at the expense of accessibility to readers not deeply familiar with MRS technology. The introduction and discussions would benefit from a clearer elucidation of why these specific metabolite markers were chosen and their known relevance to neuronal and glial cells, placing this in the context of what is novel compared to existing literature.

      - The novelty aspect could be reinforced by a more structured discussion on how this method could change the current understanding or practices within neurodevelopmental research, compared to the current state of the art.

      R2.3: See answer to (1). By restructuring the introduction and the discussion, we hope to have addressed this point. We now discuss how these findings compare to the state of the art (notably added comparison with dMRI research). Along with the next comment, we better discuss potential implications of these findings for neurodevelopmental research.

      (4) Completeness:

      - The Discussion section requires expansion to offer a more comprehensive interpretation of how these findings impact the broader field of neurodevelopment and psychiatric disorders. Specifically, the implications for human studies or clinical translation are touched upon but not fully explored.

      - Further, while supplementary material provides necessary detail on methodology, key findings from these analyses should be summarized and discussed in the main text to ensure the manuscript stands complete on its own.

      R2.4: Thank you for these helpful suggestions. We now integrate the findings better into the existing literature. We notably discuss how the results might translate to humans.

      (5) Grammar, Style, Orthography:

      There are sporadic grammatical and typographical errors throughout the text which, while minor, detract from the overall readability. For example, inconsistencies in metabolite abbreviations (e.g., tCr vs Cr+PCr) should be standardized.

      R2.5: Thank you for the careful review. This has been corrected.

      (6) References and Additional Context:

      The current reference list is extensive but lacks integration into the narrative. Direct comparisons with existing studies, especially those with conflicting or supportive findings, are scant. More dedicated effort to contextualize this work within the existing body of knowledge would be beneficial.

      R2.6: Because the nature of this work is novel, it is difficult to find directly conflicting/similar works. However, we now integrate the findings into the broader literature.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor comments:

      Thank you for the careful review, we have addressed most of the minor comments, except for the last one, which we discuss below.

      - Some figures appear blurred in the printed PDF- Introduction: "constrained and hindered by cell membranes," - maybe use "restricted" instead of "constrained", like everywhere else in the text

      - Introduction: "(typically ~8cm3 vs ~8mm3 in dMRI in humans)" - here I suggest to put the rat brain sizes instead to help the reader understand how small the voxel was at P5 in this study, thus explaining the challenges

      - Fig 1 - numbers 1 and 2 on panel A,B should be clarified and they do not match 1 and 2 on panel C, which is confusing- Fig 2 - I am guessing the large dots are the mean and small are individual data points? Please clarify

      - Please specify "Relative CRLB" rather than just "CRLB", in supp. mat as well

      - Fig 3 - title of panel B, I would change "signal" into "concentration"

      - Fig 3 - end of caption: "and levelled to get Signal(tCr,P30)/Signal(MM,P30)=8", I think "in the thalamus" is missing

      - The results section "Biophysical modelling underlines different developmental trajectories of cell microstructure between the cerebellum and the thalamus" is sometimes unprecise, e.g.: "Cerebellum: The sphere fraction and the radius estimated from tNAA diffusion properties vary with age." but the tNAA sphere fraction seems to vary more with age in the thalamus according to table 1 "Cerebellum: fsphere decreases from 0.63 (P10) to 0.41 (P30), but R is stable" this is for tCr I presume

      - Table 1 - "pvalues" please add "before multiple comparison correction"

      - Figure 5 - Panel B, the L-segment subpanel is unclear -which metabolites is it referring to? Why does Tau have a * in panel A?

      - Update Ref 37 to the journal version

      - Methods: "A STELASER (Ligneul et al., MRM 2017) sequence", add numbered reference instead

      - Please specify that the DIVE toolbox uses Gaussian phase distribution approximation, it is important for the dMRS reader given that your diffusion gradient length is long and cannot be neglected, and that the SGP approximation does not apply.

      The Gaussian phase distribution approximation and the SGP approximation are two different concepts. The gradient duration ∂ (7 ms) is short compared to the gradient separation ∆ (100 ms), but it could still be considered too long for the SGP approximation to hold. However, the gradient duration is accounted for in DIVE in any case.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors Eapen et al. investigated the peptide inhibitors of Cdc20. They applied a rational design approach, substituting residues found in the D-box consensus sequences to better align the peptides with the Cdc20-degron interface. In the process, the authors designed and tested a series of more potent binders, including ones that contain unnatural amino acids, and verified binding modes by elucidating the Cdc-20-peptide structures. The authors further showed that these peptides can engage with Cdc20 in the cellular context, and can inhibit APC/CCdc20 ubiquitination activity. Finally, the authors demonstrated that these peptides could be used as portable degron motifs that drive the degradation of a fused fluorescent protein.

      Strengths:

      This manuscript is clear and straightforward to follow. The investigation of different peptide variations was comprehensive and well-executed. This work provided the groundwork for the development of peptide drug modalities to inhibit degradation or apply peptides as portable motifs to achieve targeted degradation. Both of which are impactful.

      Weaknesses:

      A few minor comments:

      (1) In my opinion, more attention to the solubility issue needs to be discussed and/or tested. On page 10, what is the solubility of D2 before a modification was made? The authors mentioned that position 2 is likely solvent exposed, it is not immediately clear to me why the mutation made was from one hydrophobic residue to another. What was the level of improvement in solubility? Are there any affinity data associated with the peptide that differ with D2 only at position 2?

      The reviewer is correct that we have not done any detailed solubility characterisation; we refer only to observations rather than quantitative analysis. We wrote that we reverted from Leu to Ala due to solubility - we have clarified this statement (page 11) to say that that we reverted to Ala, as it was the residue present in D1, for which we observed a measurable affinity by SPR and saw a concentration-dependent response in the thermal shift analysis. We do not have any peptides or affinity data that explore single-site mutations with the parental peptide of D2. D2 is included in the paper because of its link to the consensus D-box sequence and thus was the logical path to the investigations into positions 3 and 7 that come later in the manuscript.

      (2) I'm not entirely convinced that the D19 density not observed in the crystal structure was due to crystal packing. This peptide is peculiar as it also did not induce any thermal stabilization of Cdc20 in the cellular thermal shift assay. Perhaps the binding of this peptide could be investigated in more detail (i.e., NMR?) Or at least more explanation could be provided.

      This section has been clarified (page 16). The lack of observed density was likely due to the relatively low affinity of D19 and also to the lack of binding of the three C-terminal residues in the crystal, and consequently it has a further reduced affinity. The current wording in the manuscript puts greater emphasis on this second aspect being a D19-specific issue, even though it applies to all four soaked peptides. The extent of peptide-induced thermal stabilisations observed by TSA and CETSA is different, with the latter experiment consistently showing smaller shifts. This observation may be due to the more complex medium (cell lysate vs. purified protein) and/or different concentrations of the proteins in solution. In the CETSA, we over-expressed a HiBiT-tagged Cdc20, which is present in addition to any endogenously expressed Cdc20. Although we did not investigate it, the near identical D-box binding sites on Cdc20 and Cdh1 would suggest that there will be cross-specificity, which could further influence the CETSA experiments.

      The section now reads:

      “We therefore assume that this is the reason for the lack of observed density in this region of the peptides D20 and D21 (Fig. S3E and S3F, respectively). We believe that it causes a reduction in binding affinities of all peptides in crystallo, given the evidence from SPR highlighting a role of position 7 in the interaction (Table 1). Interestingly, the observed electron density of the peptide correlates with Cdc20 binding affinity: D21 and D20, having the highest affinities, display the clearest electron density allowing six amino acids to be modeled, whereas D7 shows relatively poor density permitting modelling of only four residues. For D19, the lack of density observed likely reflects its intrinsically weaker affinity compared to the other peptides, in addition to losing the interactions from position 7 due to crystal packing.”

      Reviewer #2 (Public review):

      Summary:

      The authors took a well-characterised (partly by them), important E3 ligase, in the anaphase-promoting complex, and decided to design peptide inhibitors for it based on one of the known interacting motifs (called D-box) from its substrates. They incorporate unnatural amino acids to better occupy the interaction site, improve the binding affinity, and lay foundations for future therapeutics - maybe combining their findings with additional target sites.

      Strengths:

      The paper is mostly strengths - a logical progression of experiments, very well explained and carried out to a high standard. The authors use a carefully chosen variety of techniques (including X-ray crystallography, multiple binding analyses, and ubiquitination assays) to verify their findings - and they impressively achieve their goals by honing in on tight-binders.

      Weaknesses:

      Some things are not explained fully and it would be useful to have some clarification. Why did the authors decide to model their inhibitors on the D-box motif and not the other two SLiMs that they describe?

      For completeness, in addition to the D-box we did originally construct peptides based on the ABBA and KEN-box motifs, but they did not show any shift in melting temperature of cdc20 in the thermal shift assay whereas the D-box peptides did; consequently, we focused our efforts on the D-box peptides. Moreover, there is much evidence from the literature that points to the unique importance of the D-box motif in mediating productive interactions of substrates with the APC/C (i.e. those leading to polyubiquitination & degradation). One of the clearest examples is a study by Mark Hall’s lab (described in Qin et al. 2016), which tested the degradation of 15 substrates of yeast APC/C in strains carrying alleles of Cdh1 in which the docking sites for D-box, KEN or ABBA were mutated. They observed that whereas degradation of all 15 substrates depended on D-box binding, only a subset required the KEN binding site on Cdh1 and only one required the ABBA binding site. A more recent study from David Morgan’s lab (Hartooni et al. 2022) looking at binding affinities of different degron peptides concluded that KEN motif has very low affinity for Cdc20 and is unlikely to mediate degradation of APC/C-Cdc20 substrates. Engagement of substrate with the D-box receptor is therefore the most critical event mediating APC/C activity and the interaction that needs to be blocked for most effective inhibition of substrate degradation.

      We have added the following text to the Results section “Design of D-box peptides” (page 10):

      “We focused on D-box peptides, as there is much evidence from the literature that points to the unique importance of the D-box motif in mediating productive interactions of substrates with the APC/C (i.e. those leading to polyubiquitination & degradation). One of the clearest examples is a study that tested the degradation of 15 substrates of yeast APC/C in strains carrying alleles of Cdh1 in which the docking sites for D-box, KEN or ABBA were mutated ((Qin et al. 2017)). They observed that, whereas degradation of all 15 substrates depended on D-box binding, only a subset required the KEN binding site on Cdh1 and only one required the ABBA binding site. A more recent study (Hartooni et al. 2022) of binding affinities of different degron peptides concluded that KEN motif has very low affinity for Cdc20 and is unlikely to mediate degradation of APC/C-Cdc20 substrates. Engagement of substrate with the D-box receptor is therefore the most critical event mediating APC/C activity and the interaction that needs to be blocked for most effective inhibition of substrate degradation.”

      What exactly do they mean when they say their 'observation is consistent with the idea that high-affinity binding at degron binding sites on APC/C, such as in the case of the yeast 'pseudo-substrate' inhibitor Acm1, acts to impede polyubiquitination of the bound protein'? It's an interesting thing to think about, and probably the paper they cite explains it more but I would like to know without having to find that other paper.

      Interesting results from a number of labs (Choi et al. 2008,  Enquist-Newman et al. 2008,  Burton et al. 2011, Qin et al. 2019) have shown that mutation of degron SLiMs in Acm1 that weaken interaction with the APC/C have the unexpected consequence of converting Acm1 from APC/C inhibitor to APC/C substrate. A necessary conclusion of these studies is that the outcome of degron binding (i.e. whether the binder functions as substrate or inhibitor) depends on factors other than D-box affinity and that D-box affinity can counteract them. One idea is that if a binder interacts too tightly, this removes some flexibility required for the polyubiquitination process. The most recent study on this question (Qin et al.2019) specifically pins the explanation for the inhibitory function of the high affinity D-box in Acm1 on its ‘D-box Extension’ (i.e. residues 8-12) preventing interaction with APC10.  In our current study, the binding affinity of peptides is measured against Cdc20. In cellular assays however, the D-box must also engage APC10 for degradation to occur. It may be that the peptide binding most strongly to the D-box pocket on Cdc20 is less able to bind to APC10 and therefore less effective in triggering APC10-dependent steps in the polyubiquitination pathway. The important Hartooni et al. paper from David Morgan’s lab confirms that even though the binding of D-box residues to APC10 is very weak on its own, it can contribute 100X increase in affinity of a peptide by adding cooperativity to the interaction of D-box with co-activator. Re Figure 6 and the fact that we did look at peptide binding in cells, these experiments were done in unsynchronised cells, so most Cdc20 would not be bound to APC/C.

      We have modified the text (page 18) from:

      “However, we found the opposite effect: D2 and D3 showed increased rates of mNeon degradation compared to D1 and D19 (Fig. 8C,D). This observation is consistent with the idea that high-affinity binding at degron binding sites on APC/C, such as in the case of the yeast ‘pseudo-substrate’ inhibitor Acm1, acts to impede polyubiquitination of the bound protein (Qin et al. 2019). Indeed, there is no evidence that Hsl1, which is the highest affinity natural D-box (D1) used in our study, is degraded any more rapidly than other substrates of APC/C in yeast mitosis. As shown in Qin et al., mutation of the high affinity D-box in Acm1 converts it from inhibitor to substrate (Qin et al. 2019). Overall, our results support the conclusions that all the D-box peptides engage productively with the APC/C and that the highest affinity interactors act as inhibitors rather than functional degrons of APC/C.”

      to:

      “However, we found the opposite effect: D2 and D3 showed increased rates of mNeon degradation compared to D1 and D19 (Fig. 8C,D). This observation is consistent with conclusions from other studies that affinity of degron binding does not necessarily correlate with efficiency of degradation.  Indeed, there is no evidence that Hsl1, which is the highest affinity natural D-box (D1) used in our study, is degraded any more rapidly than other substrates of APC/C in yeast mitosis. A number of studies of a yeast ‘pseudo-substrate’ inhibitor Acm1, have shown that mutation of the high affinity D-box in Acm1 converts it from inhibitor to substrate (Choi et al. 2008,  Enquist-Newman et al. 2008,  Burton et al. 2011) through a mechanism that governs recruitment of APC10 (Qin et al. 2019). Our study does not consider the contribution of APC10 to binding of our peptides to APC/C<sup>Cdc20</sup> complex, but since there is strong cooperativity provided by this additional interaction (Hartooni et al. 2022) we propose this as the critical factor in determining the ability of the different peptides to mediate degradation of associated mNeon.”

      Reviewer #3 (Public review):

      Summary:

      Eapen and coworkers use a rational design approach to generate new peptide-inspired ligands at the D-box interface of cdc20. These new peptides serve as new starting points for blocking APC/C in the context of cancer, as well as manipulating APC/C for targeted protein degradation therapeutic approaches.

      Strengths:

      The characterization of new peptide-like ligands is generally solid and multifaceted, including binding assays, thermal stability enhancement in vitro and in cells, X-ray crystallography, and degradation assays.

      Weaknesses:

      One important finding of the study is that the strongest binders did not correlate with the fastest degradation in a cellular assay, but explanations for this behavior were not supported experimentally. Some minor issues regarding experimental replicates and details were also noted.

      Interesting results from a number of labs (Choi et al. 2008,  Enquist-Newman et al. 2008,  Burton et al. 2011, Qin et al. 2019) have shown that mutation of degron SLiMs in Acm1 that weaken interaction with the APC/C have the unexpected consequence of converting Acm1 from APC/C inhibitor to APC/C substrate. A necessary conclusion of these studies is that the outcome of degron binding (i.e. whether the binder functions as substrate or inhibitor) depends on factors other than D-box affinity and that D-box affinity can counteract them. One idea is that if a binder interacts too tightly, this removes some flexibility required for the polyubiquitination process. The most recent study on this question (Qin et al.2019) specifically pins the explanation for the inhibitory function of the high affinity D-box in Acm1 on its ‘D-box Extension’ (i.e. residues 8-12) preventing interaction with APC10.  In our current study, the binding affinity of peptides is measured against Cdc20. In cellular assays however, the D-box must also engage APC10 for degradation to occur. It may be that the peptide binding most strongly to the D-box pocket on Cdc20 is less able to bind to APC10 and therefore less effective in triggering APC10-dependent steps in the polyubiquitination pathway. The important Hartooni et al. paper from David Morgan’s lab confirms that even though the binding of D-box residues to APC10 is very weak on its own, it can contribute 100X increase in affinity of a peptide by adding cooperativity to the interaction of D-box with co-activator. Re Figure 6 and the fact that we did look at peptide binding in cells, these experiments were done in unsynchronised cells, so most Cdc20 would not be bound to APC/C.

      We have modified the text (page 18) from:

      “However, we found the opposite effect: D2 and D3 showed increased rates of mNeon degradation compared to D1 and D19 (Fig. 8C,D). This observation is consistent with the idea that high-affinity binding at degron binding sites on APC/C, such as in the case of the yeast ‘pseudo-substrate’ inhibitor Acm1, acts to impede polyubiquitination of the bound protein (Qin et al. 2019). Indeed, there is no evidence that Hsl1, which is the highest affinity natural D-box (D1) used in our study, is degraded any more rapidly than other substrates of APC/C in yeast mitosis. As shown in Qin et al., mutation of the high affinity D-box in Acm1 converts it from inhibitor to substrate (Qin et al. 2019). Overall, our results support the conclusions that all the D-box peptides engage productively with the APC/C and that the highest affinity interactors act as inhibitors rather than functional degrons of APC/C.”

      to:

      “However, we found the opposite effect: D2 and D3 showed increased rates of mNeon degradation compared to D1 and D19 (Fig. 8C,D). This observation is consistent with conclusions from other studies that affinity of degron binding does not necessarily correlate with efficiency of degradation.  Indeed, there is no evidence that Hsl1, which is the highest affinity natural D-box (D1) used in our study, is degraded any more rapidly than other substrates of APC/C in yeast mitosis. A number of studies of a yeast ‘pseudo-substrate’ inhibitor Acm1, have shown that mutation of the high affinity D-box in Acm1 converts it from inhibitor to substrate (Choi et al. 2008,  Enquist-Newman et al. 2008,  Burton et al. 2011) through a mechanism that governs recruitment of APC10 (Qin et al. 2019). Our study does not consider the contribution of APC10 to binding of our peptides to APC/C<sup>Cdc20</sup> complex, but since there is strong cooperativity provided by this additional interaction (Hartooni et al. 2022) we propose this as the critical factor in determining the ability of the different peptides to mediate degradation of associated mNeon.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) On page 12 (towards the end), the author stated D10 contained an A3P mutation, they meant P3A right? 'To test this hypothesis, we proceeded to synthesise D10, a derivative of D4 containing an A3P single point mutation.'

      We thank the reviewer for spotting this typo, which we have corrected.

      (2) Have the authors considered other orthogonal approaches to cross-examine/validate binding affinities? That said, I do not think extra experiments are necessary.

      We did not explore further orthogonal approaches due to the challenges of producing sufficient amounts of the Cdc20 protein. Due to the low affinities of many peptides for Cdc20, many techniques would have required more protein than we were able to produce. We believe that the qualitative TSA combined with the SPR is sufficient to convince the readers; indeed there is a correlation between SPR-determined binding affinities and the thermal shifts: For the natural amino acid-containing peptides (Table 1) D19 has the highest affinity and causes the largest thermal shift in the Cdc20 melting temperature, D10 has the lowest affinity and causes the smallest thermal shift, and D1, D3, D4, and D5 and all rank in the middle by both techniques. For those peptides containing unnatural amino acids (Table 2), again higher affinities are reflected in larger thermal shifts.

      Reviewer #2 (Recommendations for the authors):

      The data seem fine to me. I would appreciate a little more detail on the points mentioned in the public review. Also a thorough reread, maybe by a disinterested party as there are various typos that could be corrected - all in all an excellent clear paper that encompasses a lot of work.

      A colleague has carefully checked the manuscript, and typos have been corrected.

    1. Author response:

      We wish to express our gratitude to the reviewers for their insightful and constructive comments on the initial version of our manuscript. We greatly value their observations and have every intention of addressing their remarks in a thorough and constructive manner. Based on the editors’ and reviewers’ feedback, we realize that it was not entirely clear that we intended this work primarily to be a resource and not yield strong insights into DNN-human alignment. Since our method also covers the broad range of natural objects - as used in the vast majority of studies on object processing - we also feel we did not sufficiently highlight the breadth of the tool. Based on the editors’ assessment, our explorations into the limits of the method - which we saw as a strength, not a weakness of our work - perhaps overshadowed the otherwise broad applicability somewhat. We hope to clarify this in the revised manuscript. Beyond these general points, we would like to address the following four points:

      • Where feasible, we intend to undertake additional analyses and refine existing ones. For instance, we plan to provide noise ceilings for all datasets where such calculations are possible, and we plan to give careful consideration to implementing a permutation or label-shuffling test to explore some of the ideas shared by the reviewers.

      • We plan to discuss more thoroughly several topics raised by the reviewers (e.g., how our approach might contend with different experimental situations such when using line drawings as stimuli).

      • We aim to enhance the clarity of our manuscript throughout. This will include refining the wording of our abstract and offering a more detailed explanation of the methods employed in the fMRI analyses.

      • We plan to elaborate further on our line of reasoning by addressing potential sources of misunderstanding—such as clarifying what we mean by a “lack of data” and providing greater detail regarding the nature of the 49-dimensional embedding.

    1. Author response:

      The evidence supporting this mechanism is incomplete, with additional work needed to clarify SHP-1's role, the contribution of Fc receptor crosslinking, and the biological relevance across normal and malignant B cells. 

      We will address these points by:

      - including SHP-1 inhibitors in the DuoHexaBody-CD37 cytotoxicity experiments to address the role of SHP-1

      - investigating which Fc receptors are involved in the crosslinking using FcR blocking antibodies and/or use purified fixed effector cells that express different Fc receptors in the DuoHexaBody-CD37 cytotoxicity experiments 

      - study the effect of DuoHexaBody-CD37 on normal B cells

      As the findings are based primarily on in vitro models, further validation would be required to support broader translational conclusions.

      We would like to refer to previous studies that showed potent cytotoxicity of DuoHexaBody-CD37 in vivo, including xenograft and PDX lymphoma models supporting broader translational conclusions:

      Oostindie et al. Blood Cancer Journal (2020) 10:30 https://doi.org/10.1038/s41408-020-0292-7

    1. Author response:

      We thank the reviewers for their comments and for their constructive suggestions. We intend to submit a revised manuscript where we address the comments made in the Public Reviews as well as in the Recommendations for the Authors.

      One of our most interesting findings, as noted by the reviewers, was the discovery of a small subpopulation of cells likely arrested in G2 that accounts for a disproportionate amount of radiation-induced gene expression. In addition, to the responses indicated below, we are planning to include additional “wet lab” experiments in the revised manuscript that address the properties of this seemingly important subpopulation of cells.

      Reviewer 1:

      Strengths:

      (1) The authors have used robust methods for rearing Drosophila larvae, irradiating wing discs, and analyzing the data with Seurat v5 and HHI.

      (2) These data will be informative for the field.

      (3) Most of the data is well-presented.

      (4) The literature is appropriately cited.

      Thank you for these comments

      Weaknesses:

      (1) The data in Figure 1 are single-image representations. I assume that counting the number of nuclei that are positive for these markers is difficult, but it would be good to get a sense of how representative these images are and how many discs were analyzed for each condition in B-M.

      (2) Some of the figures are unclear.

      In the revised manuscript, we will provide a more detailed quantitative analysis. For each condition, we analyzed 4 - 9 discs.

      We assume that the reviewer in referring to panels in Figure 1. We will review these images and if necessary, repeat the experiments or choose alternative images that appear clearer.

      Reviewer 2:

      Overall, the data presented in the manuscript are of high quality but are largely descriptive. This study is therefore perceived as a resource that can serve as an inspiration for the field to carry out follow-up experiments.

      We intend to include more  “wet lab” experiments in our revised manuscript to address the identity and properties of the high-trbl cells that we have identified using the clustering approach based on cell-cycle gene expression.

      Reviewer 3:

      Strengths:

      Overall, the manuscript makes a compelling case for heterogeneity in gene expression changes that occur in response to uniform induction of damage by X-rays in a single-layer epithelium. This is an important finding that would be of interest to researchers in the field of DNA damage responses, regeneration, and development.

      Thank you.

      Weaknesses:

      This work would be more useful to the field if the authors could provide a more comprehensive discussion of both the impact and the limitations of their findings, as explained below.

      Propidium iodide staining was used as a quality control step to exclude cells with a compromised cell membrane. But this would exclude dead/dying cells that result from irradiation. What fraction of the total do these cells represent? Based on the literature, including works cited by the authors, up to 85% of cells die at 4000R, but this likely happens over a longer period than 4 hours after irradiation. Even if only half of the 85% are PI-positive by 4 hr, this still removes about 40% of the cell population from analysis. The remaining cells that manage to stay alive (excluding PI) at 4 hours and included in the analysis may or may not be representative of the whole disc. More relevant time points that anticipate apoptosis at 4 hr may be 2 hr after irradiation, at which time pro-apoptotic gene expression peaks (Wichmann 2006). Can the authors rule out the possibility that there is heterogeneity in apoptosis gene expression, but cells with higher expression are dead by 4 hours, and what is left behind (and analyzed in this study) may be the ones with more uniform, lower expression? I am not asking the authors to redo the study with a shorter time point, but to incorporate the known schedule of events into their data interpretation.

      We thank the reviewer for these important comments. The generation of single-cell RNAseq data from irradiated cells is tricky. Many cells have already died. Even those that do not incorporate propidium iodide are likely in early stages of apoptosis or are physiologically unhealthy and likely made it through our FACS filters. Indeed, in irradiated samples up to  57% of sequenced cells were not included in our analysis since their RNA content seemed to be of low quality. It is therefore likely that our data are biased towards cells that are less damaged. As advised by the reviewer, we will include a clearer discussion of these issues as well as the time course of events and how our analysis captures RNA levels only at a single time point.

      If cluster 3 is G1/S, cluster 5 is late S/G2, and cluster 4 is G2/M, what are clusters 0, 1, and 2 that collectively account for more than half of the cells in the wing disc? Are the proportions of clusters 3, 4, and 5 in agreement with prior studies that used FACS to quantify wing disc cells according to cell cycle stage?

      Clusters 0, 1, and 2 likely contain cells in other stages of the cell cycle, including early G1. Other studies indicate that more than 70% of cells are expected to have a 4C DNA content 4 h after irradiation at 4000 Rad. The high-trbl cluster only accounts for 18% of cells. Thus clusters 0, 1 and 2 could potentially contain other populations that also have a 4C DNA content. Importantly, similar proportions of cells in these clusters are also observed in unirradiated discs. We are mining the gene expression patterns in these clusters with the goal of estimating their location in the cell cycle and will include those data in the revised manuscript.

      The EdU data in Figure 1 is very interesting, especially the persistence in the hinge. The authors speculate that this may be due to cells staying in S phase or performing a higher level of repair-related DNA synthesis. If so, wouldn't you expect 'High PCNA' cells to overlap with the hinge clusters in Figures 6G-G'? Again, no new experiments are needed. Just a more thorough discussion of the data.

      We have found that the locations of elevated PCNA expression do not always correlate with the location of EdU incorporation either by examining scRNA-seq data or by using HCR to detect PCNA. PCNA expression is far more widespread. We intend to present additional data that address this point and also a more thorough discussion in the revised manuscript.

      Trbl/G2/M cluster shows Ets21C induction, while the pattern of Ets21C induction as detected by HCR in Figures 5H-I appears in localized clusters. I thought G2/M cells are not spatially confined. Are Ets21C+ cells in Figure 5 in G2/M? Can the overlap be confirmed, for example, by co-staining for Trbl or a G2/M marker with Ets21C?

      The data show that the high_-trbl_ cells are higher in Ets21C transcripts relative to other cell-cycle-based clusters after irradiation. This does not imply that high-trbl-cells in all regions of the disc upregulate Ets21C equally. Ets21C expression is likely heterogeneous in both ways – by location in the disc and by cell-cycle state. We will attempt to look for co-localization as suggested by the reviewer.

      Induction of dysf in some but not all discs is interesting. What were the proportions? Any possibility of a sex-linked induction that can be addressed by separating male and female larvae?

      We can separate the cells in our dataset into male and female cells by expression of lncRNA:roX1/2. When we do this, we see X-ray induced dysf expressed similarly in both male and female cells. We think that it is therefore unlikely that this difference in expression can be attributed to cell sex. We are investigating other possibilities such as the maturity of discs.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Epiney et al. use single-nuclei RNA sequencing (snRNA-seq) to characterize the lineage of Type-2 (T2) neuroblasts (NBs) in the adult Drosophila brain. To isolate cells born from T2 NBs, the authors used a genetic tool that specifically allows the permanent labeling of T2-derived cell types, which are then FAC-sorted for snRNA-seq. This effective labeling approach also allows them to compare the isolated T2 lineage cells with T1-derived cell types by a simple exclusion method. The authors begin by describing a transcriptomic atlas for all T1 and T2-derived neuronal and glia clusters, reporting that the T2-derived lineage comprises 161 neuronal clusters, in contrast to the T1 lineage which comprises 114 of them. The authors then use the expression of VAChT, VGlut, Gad1, Tbh, Ple, SerT, and Tdc2 to show that T2 neuroblasts generate all major neuron classes of fast-acting neurotransmitters. Strikingly, they show that a subset of glia and neuronal clusters have disproportionate enrichment in males or females, suggesting that T2 neuroblasts generate sex-biased cell types. The authors then proceed to characterize neuropeptide expression across T2-derived neuronal clusters and argue that the same neuropeptide can be expressed across different cell types, while similar cell types can express distinct neuropeptides. The functional implication of both observations, however, remains to be tested. Furthermore, the authors describe combinatorial transcription factor (TF) codes that are correlated with neuropeptide expression for T2-derived neurons along with an overall TF code for all T2-derived cell types, both of which will serve as an important starting point for future investigations. Finally, the authors map well-studied neuronal types of the central complex to the clusters of their T2-derived snRNA-seq dataset. They use known marker combinations, bulk RNA-seq data and highly specific split-GAL4 driver lines to annotate their T2-derived atlas, establishing a comprehensive transcriptomic atlas that would guide future studies in this field.

      Thanks for the clear and accurate summary of our findings.

      Strengths:

      This study provides an in-depth transcriptomic characterization of neurons and glia derived from Type-2 neuroblast lineages. The results of this manuscript offer several future directions to investigate the mechanisms of diversifying neuronal identity. The datasets of T1-derived and T2-derived cells will pave the way for studies focused on the functional analysis of combinatorial TF codes specifying cell identity, sex-based differences in neurogenesis and gliogenesis, the relationship between neuropeptide (co)expression and cell identity, and the differential contributions of distinct progenitor populations to the same cell type.

      Thank you for the positive comments.

      Weaknesses:

      The study presents several important observations based on the characterization of Type II neuroblast-derived lineages. However, a mechanistic insight is missing for most observations. The idea that there is a sex-specific bias to certain T2-derived neurons and glial clusters is quite interesting, however, the functional significance of this observation is not tested or discussed extensively. Finally, the authors do not show whether the combinatorial TF code is indeed necessary for neuropeptide expression or if this is just a correlation due to cell identity being defined by TFs. Functional knockdown of some candidate TFs for a subset of neuropeptide-expressing cells would have been helpful in this case.

      We agree that we do not provide mechanistic or functional insights. Our goal was to produce hypothesis generating datasets for our lab and others to use to direct functional or mechanistic studies.

      Reviewer #2 (Public review):

      In this manuscript, Epiney et al., present a single-nucleus sequencing analysis of Drosophila adult central brain neurons and glia. By employing an ingenious permanent labeling technique, they trace the progeny of T2 neuroblasts, which play a key role in the formation of the central complex. This transcriptomic dataset is poised to become a valuable resource for future research on neurogenesis, neuron morphology, and behavior.

      Thank you for the positive comments.

      The authors further delve into this dataset with several analyses, including the characterization of neurotransmitter expression profiles in T2-derived neurons. While some of the bioinformatic analyses are preliminary, they would benefit from additional experimental validation in future studies.

      Thank you for the positive comments. We too hope that future research will benefit from this dataset.

      Reviewer #1 (Recommendations for the authors):

      Major points

      (1) In Figures 1E and 4A, the T1 and T2 glia subsets reveal sub-clusters for several cell types as seen by the distribution of points on the UMAP. This observation is never validated or discussed. Do these sub-clusters represent true differences in identities or are they artifacts of the single-nucleus preparation? For Figure 1E, it is not clear whether specific sub-clusters (see Ensheathing-4 vs Ensheathing-5 and Astrocyte-2 vs. Astrocyte-6) are differentially enriched between the T1 and T2 lineages. The existence of these sub-clusters must be discussed or dismissed.  

      We agree that this needs to be addressed more clearly in the manuscript and have made text changes in the Results and Discussion sections to clarify. We note that a recent glial cell atlas (Lago-Baldaia et al., 2023: PMID: 37862379) of the developing fly VNC and optic lobes found sub-clusters that mapped to the same subtype annotations. Interestingly, Lago-Baldaia and colleagues found that the transcriptional diversity of glia cell types did not match the morphological diversity of glia validated in vivo. See text changes below.

      Lines 131-133: “Similar to a previous glial cell atlas (Lago-Baldaia et al., 2023) we found some glial subtypes (astrocytes, ensheathing, and subperineurial) mapped to multiple clusters (Figure 1E, 1F).”

      Lines 206-208: “In line with our T1+T2 atlas and previous glia cell atlas (Lago-Baldaia et al., 2023), some subtypes mapped to several subclusters including ensheathing, astrocytes, and chiasm (Figure 4A-B).”

      Lines 397-401: “Similar to a recent glial cell atlas (Lago-Baldaia et al., 2023), we found glial subtypes like astrocytes, ensheathing, and subperineurial glia mapped to several sub-clusters (Figure 1E-F). It remains unclear if these sub-clusters with the same cell type annotation represent distinct glial identities or different transcriptional states within these populations.”

      (2) The authors present evidence for sex-specific neuronal and glia subtypes and find differential expression of specific yolk proteins and long non-coding RNAs. However, whether any of these differences are driven by other canonical sex-specific genes such as Fruitless (Fru) or Double-sex (Dbx) has not been reported or discussed. The authors must re-analyze their data for these genes and claim whether they have any contribution to sex-specific sub-clusters.

      Thank you for pointing this out. We have made text changes and clarifications to highlight the expression of other canonical sex-specific genes. Fru was enriched in male nuclei as expected. Interestingly, dbx was enriched in female nuclei. It remains to be determined if these genes are mechanisms that may be driving sex-specific changes.

      Lines 224-226: “Additionally, female nuclei were enriched for dbx (Supp Table 8). Male glial nuclei expressed higher levels of genes including the male-specific genes lncRNA:rox1/2 and fru (Figure 5C; Supp Table 8) (Ryner et al., 1996; Amrein and Axel, 1997; Meller et al., 1997).”

      Lines 237-239: “Male nuclei expressed higher levels of genes including the male-specific genes lncRNA:rox1/2 and fru (Figure 5G; Supp Table 9) (Ryner et al., 1996; Amrein and Axel, 1997; Meller et al., 1997).”

      Lines 428-431:” We found the expected differential expression of yolk proteins (yp1, yp2, yp3) enriched in female nuclei and the long non-coding RNAs rox1/2 and fru enriched in male neuronal nuclei (Ryner et al., 1996; Amrein and Axel, 1997; Meller et al., 1997; Warren et al., 1979). Interestingly, we found dbx to be enriched in both glial and neuronal female nuclei.”

      Lines 433-435: “It remains to be determined if these genes are driving these sex-specific differences in glia and neurons.”

      (3) In Figure 6C, it is unclear whether the Ms-2A-LexA-expressing neurons of clusters 157 and 160 project to two different neuropils or share projects to both neuropils. However, it is not explicitly shown in the immunostaining data whether indeed there are two populations to begin with. The authors must check for cluster 157 and 160 specific markers (such as Dh44 and ple) and test whether they appear mutually exclusively in the Ms-2A-LexA-expressing neurons. The same reasoning would apply to the data shown in Figures 6D and 6E, where the authors must test whether the NPF and AstA expressing cells are indeed neurons from clusters 100 and 128, using orthogonal cluster markers to conclude that they are similar (or the same) neurons.

      We changed the focus of the paragraph to confirm that these neurons indeed come from type II and that they target the central complex. Although due to the lack of reagents we cannot test the identity of each one of these neurons, we could make meaningful interpretations of the staining to validate our ideas about neuropeptidergic cells in the central complex. We made sure to mention the limitation of our experiment to avoid any wrong conclusions.

      Minor points

      (1) Line 115 - "cluster that represents optic lobe neurons". How was this cluster identified?

      We reexamined the most significant genes enriched in this cluster 124, and found they are Rh2, ninaC, trpl, and phototransduction related genes (Supplemental table 1). We reassigned the identity of this cluster as ocelli, which also express photoreceptor genes but can’t be easily removed during dissection. We modified the text as follows:

      "We used known markers (Croset et al., 2018; Davie et al., 2018; Supp Table 2) to identify distinct cell types in the central brain, including glia, mushroom body neurons, olfactory projection neurons, clock neurons, Poxn+ neurons, serotonergic neurons, dopaminergic neurons, octopaminergic neurons, corazonergic neurons, hemocytes, and ocelli (Figure 1B, Supp. Table 1)."

      (2) As the separation in Figure 1B is not obvious, annotated cell type clusters must be re-colored instead of being labelled as the exact dots are indistinguishable. This would especially be helpful for OCTY, SER, OPN, and CLK clusters.

      (3) Cluster labels in Figure 1C are barely visible and the font size must be increased for the reader. Recoloring the cluster identities and attaching a legend would again help in this case.

      We recolored the atlas in Figure 1B, 1C and 1C’ and increased the font size in Figure 1C’.

      (4) For Figure 4A, clusters should be labelled on the UMAP along with the legend as it is difficult for the reader to match identities using Seurat colors. The same is true for the UMAPs in Figure 5A.

      Yes, we agree that labeling would improve readability and have done so for UMAPs in Figure 4A and 5A-A’’.

      Reviewer #2 (Recommendations for the authors):

      In this manuscript, Epiney et al., present a single-nucleus sequencing analysis of adult central brain neurons and glia Through the use of a ingenious permanent labeling technique, they are able to trace the progeny of T2 neuroblasts, which contribute significantly to the formation of the central complex. This transcriptomic dataset is the first of its kind and will likely serve as a valuable resource for future studies.

      The authors further explore this dataset through several analyses, including the characterization of neurotransmitter expression profiles in T2-derived neurons. However, the approach used to identify the identity of each neuron cluster could be more clearly articulated, and some of the authors' conclusions are more generalized - either already well-established or lacking sufficient support.

      Detailed comments:

      Abstract - "Our data support the hypothesis that each transcriptional cluster represents one or a few closely related neuron subtypes. - Is this a novel finding? If so, it would be helpful if the authors could explain why this is the case more clearly.

      Our results are not generally novel, and many single cell/single nuclei RNA-seq papers have been published (more citations added to Introduction). Our work is novel in that we analyze Type 1 and Type 2 neuroblasts in the central brain.

      Line 53 - In the introduction the authors should also reference other single-cell studies done in the Drosophila brain.

      Done.

      Line 59 - There are some typos here. The authors could also mention type zero.

      Both done.

      Figure 1 and Sup Table 1 - Authors show in sup table 1 the top cell markers by cluster but there is no correspondence between cluster number and identity. The authors do not say which known markers were used to give the identity to each cluster.

      We have added the cell identity in the Supplemental Table 1. For the unknown cells, we left the column blank. We have also added a Supplemental Table 2 to show the markers we used to give identity to the clusters.

      Supplementary Tables - For each table, more detailed information should be provided regarding what is being compared and the methods used for these comparisons.

      We have added the methods we used in Seurat to generate each individual table.

      Line 138 - Differential gene expression analysis between T1 and T2 glial progeny did not show differences across any glial cell types (Supp Table 4). - Was this comparison done per cluster? Is differential gene expression of top markers, which are anyway the genes that define each glial cell type, enough for this type of analysis?

      Yes, we performed the differential expression analysis using all genes (i.e., not just marker defining) at a cluster-by-cluster resolution with results in Supplemental Table 4. We have edited the text to make this clarification.

      Lines 139-141: “Differential gene expression analysis for all genes between T1 and T2 glial progeny did not show differences across any glial cell types or clusters (Supp Table 4).”

      Line 146 - We identified T1-derived neurons by excluding cells co-expressing T2-specific. Markers FLP+/GFP+/RFP+ plus repo+ glial clusters. - Bioinformatically, correct?

      Yes. We clarified the sentence as follows:

      "We identified T1-derived neurons by bioinformatically excluding cells co-expressing T2-specific markers FLP+/GFP+/RFP+ plus repo+ glial clusters."

      Line 156 - We found that each cluster strongly expressed a unique combination of genes. - As they are grouped by seurat in different clusters, why is this surprising?

      Line 175 - "top 10 significantly enriched genes gathered from each T2 neuron cluster" - can these lists be included?

      Yes they are grouped by Seurat. We toned down the sentence and refer each combination of genes as cluster markers. We modified the sentences as follows:

      Each unique combination of enriched genes could be referred to as cluster markers.

      Line 211- How did the authors identify sex-biased clusters? How did the authors separate the samples/cells by sex? Was it done bioinformatically by the expression of certain genes? If so, which?

      We collected male and female nuclei separately. We have added text in the methods section as follows:

      "Equal amounts of male and female central brains (excluding optic lobes) were dissected at room temperature within 1 hour. The samples were flash-frozen in liquid nitrogen and stored separately at -80°.

      In the first round, we pooled male and female brains together to select GFP+ nuclei and used particle-templated instant partitions to capture single nuclei to generate cDNA library (Fluent BioSciences, Waterton, MA). In the second and third rounds, RFP+ nuclei from male and female brains were collected separately. The split-pool method was then used to generate barcoded cDNA libraries from each individual nucleus."

      Are there sex-specific differences in genes in glia other than genes that were previously known to be sex-specific?

      We report the comprehensive list of sex-specific differences in gene expression for both glia and neurons in Supp tables 8 and 9.

      Line 237 - When the authors mention "We conclude that male and female adult T2 neurons have sex-specific differences in gene expression within the same neuronal subtype" does this mean that these neurons are the same in male and in female brains, but they additionally specifically express sex-specific genes?

      Yes, we report that male and females contain the same neurons defined by their transcriptional profile. It remains to be seen if this sex-specific differences changes how these same neuronal subtypes function between male and females. We have added additional text in the discussion to expand on this thought.

      Lines 437-441: “It remains to be determined if these genes are driving sex-specific differences within glial and neuronal subtypes. These genes may reflect sex-specific differences in the adult central brain and may provide insight into how behavioral circuits are linked to sex-specific behaviors. Future work should aim to characterize and test these genes.”

      Line 250 - The idea behind these sections "What is the relationship between neuropeptide expression and cluster identity?" "relation between cluster and morphology" lacks clarity. As clusters are defined based on principal component analysis, and the genes used to define a cluster are dependent on this method, there is no assumption that each cluster represents only one type of neuron or that it should include only neurons expressing the same neurotransmitter genes. Even if some clusters consist of a single neuron type, this should not be generalized to all clusters (and vice-versa).

      Correct, we cannot determine from the transcriptome data whether distinct clusters will have different morphology. We have changed the focus of the question to address that we are confirming they come from type 2 and that they target the central complex while comparing to known cells that express the neuropeptide.

      Line 265 - We first assayed the neuronal morphology of Ms+ neurons - why did the authors choose these neurons?

      Resolved in main text: “we found that type II-derived Ms-2A-LexA-expressing neurons project to multiple layers of the dorsal fan-shaped body and the entire ellipsoid body, suggesting an unknown class of Ms+ neurons targeting to EB and/orFB".

      Line 268 - "Currently we can't determine whether Ms+ neurons in clusters 157 and 160 project to different CX neuropils, or whether neurons from both clusters share projections into both neuropils. " - The purpose of this point is unclear.

      Resolved in text: “we found that type II-derived Ms-2A-LexA-expressing neurons project to multiple layers of the dorsal fan-shaped body and the entire ellipsoid body, suggesting an unknown class of Ms+ neurons targeting to EB and/or FB”.

      Line 279 - This analysis could be more explored.

      Thank you for your feedback. As the comment was somewhat broad, we were unsure of the specific revisions needed and have therefore left the text unchanged.

      Line 301 - The text regarding this section, and the description and details of respective figures should be proofread to ensure clarity.

      Done.

      Line 386 - Alternatively, co-expression may be due to background from RNAs released during dissociation. - RNA in soup could be bioinformatically analysed.

      Correct. We opted to delete this sentence since our split-pool based method does not create background RNA expression. Additionally, the analysis is performed on scaled expression >2, and any background RNA is unlikely to yield such high expression.

      Discussion - Some of the conclusions are a bit too general, suggesting that the results might be meaningful, but also acknowledging the possibility of artifacts. If the authors could refine this, it would strengthen the manuscript.

      We are sorry but we are uncertain what you are asking; we don't know what you want us to refine. Our apologies for the misunderstanding.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      This review evaluates the SCellBOW framework, which applies phenotype algebra to obtain vectors from cancer subclusters or user-defined subclusters.

      Strengths:

      SCellBOW employs an innovative application of NLP-inspired techniques to analyze scRNA-seq data, facilitating the identification and visualization of phenotypically divergent cell subpopulations. The framework demonstrates robustness in accurately representing various cell types across multiple datasets, highlighting its versatility and utility in different biological contexts. By simulating the impact of specific malignant subpopulations on disease prognosis, SCellBOW provides valuable insights into the relative risk and aggressiveness of cancer subpopulations, which is crucial for personalized therapeutic strategies. The identification of a previously unknown and aggressive AR−/NElow subpopulation in metastatic prostate cancer underscores the potential of SCellBOW in uncovering clinically significant findings.

      Major concerns:

      The reliance on bulk RNA-seq data as a reference raises concerns about potentially misleading results due to the presence of RNA expression from immune cells in the TME. It is unclear if SCellBOW adequately addresses this issue, which could affect the accuracy of the cancer subcluster vectors.

      We appreciate the reviewer's concerns. To address the concern about potentially misleading results due to the TME when using bulk RNA-seq data as a reference:

      a. We account for systematic biases between the single-cell and bulk transcriptomics readouts by creating pseudo-bulk profiles for single-cell clusters, enabling more accurate comparisons [Section Materials and methods, Data preparation for phenotype algebra].

      b. We encode expressions into word vectors and co-embed them together. By doing this, we mitigate any possibility of systematic differences in the embedding. It is imperative that we subject both single-cell and bulk data through the same treatments because otherwise, it will be difficult to perform algebraic operations on them [Section Materials and methods, Generating vectors for phenotype algebra].

      c. In our new analysis of the tumor microenvironment, we have shown that SCellBOW effectively differentiates between malignant and non-malignant cells, confirming that it is not biased by the immune cell composition in the bulk RNA-seq data [Section SCellBOW facilitates survival-risk attribution of tumor subpopulations, Fig. 5g-h].

      The method of extracting vectors in phenotype algebra appears to be a straightforward subtraction operation. This simplicity might limit its efficiency in excluding associations with phenotypes from specific subpopulations, potentially leading to inaccurate interpretations of the data.

      Thanks for this excellent query. Vector algebra operations are not done in the gene expression space (i.e., gene expression vectors associated with tumor samples), rather we process the single cell and bulk expression profiles through multiple steps (pseudo-bulk vector generation for single cell clusters, mapping gene expression values to word frequencies as better understood by the Doc2vec neural networks etc.) to ensure their embeddings are consistent and capture intricate phenotypic information. We have demonstrated this through rigorous validation of the clusters yielded on various types of healthy and diseased samples. Furthermore, we have demonstrated the consistency of the vector algebra operations on known cancer subtypes in breast cancer, glioblastoma, and prostate cancer. We have clarified this further in text. [Section Materials and methods, ‘Generating vectors for phenotype algebra’, ‘Survival risk attribution’].

      The review would benefit from additional validation studies to assess the effectiveness of SCellBOW in distinguishing between cancerous and non-cancerous signals, particularly in heterogeneous tumor environments.

      We thank the reviewer for advising this additional validation. While our study primarily focused on signals from malignant cells, we have now considered the impact of the tumor microenvironment. We observed that the predicted risk score increases when the immune component is subtracted from the tumor, suggesting that tumor aggressiveness increases in the absence of immune components. Importantly, the aggressiveness ranking of tumor subtypes (NE > ARAL > ARAH) remained consistent, confirming that SCellBOW effectively preserves subtype-specific risk stratification [Section SCellBOW facilitates survival-risk attribution of tumor subpopulations, Fig. 5g-h].

      Further clarification on how SCellBOW handles mixed-cell populations within bulk RNA-seq data would strengthen the evaluation of its applicability and reliability in diverse research settings.

      We really appreciate the reviewer’s observation. We clarify that rather than relying on absolute gene expression values, SCellBOW maps bulk RNA-seq data into an embedding space, where we extract the latent representation of the tumor. This process effectively masks the influence of mixed-cell populations, reducing biases introduced by immune or stromal components. Furthermore, phenotype algebra operates within this embedding space by comparing cosine similarities between latent representations of bulk and pseudo-bulk datasets, rather than using direct gene expression values. This allows SCellBOW to capture biologically meaningful relationships and infer tumor-specific signals effectively, even in the presence of heterogeneous cell populations. Our benchmarking across diverse cancer types confirms its effectiveness [Section Results, ‘SCellBOW enables pseudo-grading of metastatic prostate cancer tumor microenvironment’, ‘Unsupervised risk-stratification of metastatic prostate cancer clusters using SCellBOW’].

      Reviewer #2 (Public Review):

      The authors developed a novel tool, SCellBOW, to perform cell clustering and infer survival risks on individual cancer cell clusters from the single-cell RNA seq dataset. The key ideas/techniques used in the tool include transfer learning, bag of words (BOW), and phenotype algebra which is similar to word algebra from natural language processing (NLP). Comparisons with existing methods demonstrated that SCellBOW provides superior clustering results and exhibits robust performance across a wide range of datasets. Importantly, a distinguishing feature of SCellBOW compared to other tools is its ability to assign risk scores to specific cancer cell clusters. Using SCellBOW, the authors identified a new group of prostate cancer cells characterized by a highly aggressive and dedifferentiated phenotype.

      Strengths:

      The application of natural language processing (NLP) to single-cell RNA sequencing (scRNA-seq) datasets is both smart and insightful. Encoding gene expression levels as word frequencies is a creative way to apply text analysis techniques to biological data. When combined with transfer learning, this approach enhances our ability to describe the heterogeneity of different cells, offering a novel method for understanding the biological behavior of individual cells and surpassing the capabilities of existing cell clustering methods. Moreover, the ability of the package to predict risk, particularly within cancer datasets, significantly expands the potential applications.

      Major concerns:

      Given the promising nature of this tool, it would be beneficial for the authors to test the risk-stratification functionality on other types of tumors with high heterogeneity, such as liver and pancreatic cancers, which currently lack clinically relevant and well-recognized stratification methods. Additionally, it would be worthwhile to investigate how the tool could be applied to spatial transcriptomics by analyzing cell embeddings from different layers within these tissue

      (1) We completely agree with the reviewer’s view. Our selection of glioblastoma and breast cancer for this study was primarily driven by the focus on extensively studied and well-defined cancer types. To demonstrate the effectiveness of our model, we tested it on advanced prostate cancer, which currently lacks clinically relevant and well-recognized stratification methods. This application to metastatic prostate cancer serves as a proof of concept, illustrating our model's potential to provide valuable insights into cancer types where established stratification approaches are limited or absent.

      (2) Regarding the application of our tool to spatial transcriptomics, we have already analyzed data from Digital Spatial Profiling (DSP). The article is already quite complex and involved, and we are afraid the inclusion of spatial transcriptomics may amount to a significant extension of the method. To this end, although we will discuss the future possibilities, we will skip the method validity check on spatial transcriptomics data.

      Reviewer #2 (Recommendations For The Authors):

      (1) "SCellBOW adapts the popular document-embedding model Doc2vec for single-cell latent representation learning, which can be used for downstream analysis...": Using only simple gene frequency might overlook the dependent relationships between genes, potentially compromising the biological significance. This could be discussed further.

      This is an excellent point raised by the reviewer. We acknowledge that using only simple gene frequency may overlook dependent relationships between genes, potentially compromising biological significance. To address this, we have now compared SCellBOW on the specific task of phenotype algebra and demonstrated its effectiveness in capturing meaningful biological relationships which is overlooked by simple gene frequency. We have now added the results of this comparison and showed that gene expression data alone couldn't cut it for accurate risk stratification [Section Overall discussion, Supplementary Note 7, Supplementary Fig. 8i-k].

      (2) "While existing methods effectively reveal the subpopulations, they are insufficient in associating malignant risk with specific cellular subpopulations identified from scRNA-seq data....": Perhaps I missed it in the methods section, but how does SCellBOW compare to simply performing pseudobulk analysis on separate cell clusters, treating them as bulk RNA-seq, and then associating the signatures with disease prognosis?

      This is an insightful point, and we appreciate the opportunity to clarify it.

      (1) While pseudobulk analysis on separate cell clusters, followed by associating their signatures with disease prognosis, is a common approach, SCellBOW achieves this without requiring a priori knowledge of prognostic biomarkers to determine whether a subpopulation is aggressive.

      (2) Moreover, pseudobulk analysis aggregates gene expression across cells, which can potentially mask intra-cluster heterogeneity, thereby obscuring important signatures associated with disease prognosis. In contrast, the latent representation in SCellBOW captures the semantic meaning of disease aggressiveness, allowing for a more nuanced and biologically meaningful risk assessment.

      (3) "The proposed approach, SCellBOW, can effectively capture the heterogeneity and risk associated with each phenotype, enabling the identification and assessment of malignant cell subtypes in tumors directly from scRNA-seq gene expression profiles, thereby eliminating the need for marker genes...": Have the author compared the resulting group with well-known markers and do they overlap?

      We appreciate this thoughtful question. While SCellBOW does not rely on predefined marker genes for clustering or risk stratification, we have systematically evaluated whether the resulting subpopulations align with well-known markers. To assess this, we compared SCellBOW-derived clusters with established marker-based annotations across multiple datasets. We observed a significant overlap between SCellBOW clusters and canonical marker-defined cell types in various cancers, including GBM, BRCA, and mCRPC.

      (4) "We constructed three use cases leveraging publicly available scRNA-seq datasets...": The three training and testing datasets are all from healthy tissue. How about in tumor tissue? i.e., Could SCellBOW also identify better cell clusters in tumor datasets?

      We appreciate the reviewer’s inquiry. For benchmarking and method validation, we primarily selected normal tissue datasets as they are heavily annotated and well-characterized. Our goal was to extensively evaluate SCellBOW across different clustering metrics, including ARI, NMI, and SI, which required datasets with reliable ground truth. Tumor datasets, in contrast, often lack confirmatory ground truth, making direct benchmarking more challenging. However, to assess SCellBOW’s applicability in tumor settings, we performed downstream analyses on tumor scRNA-seq datasets using phenotype algebra. Our results demonstrate that SCellBOW effectively identifies distinct cell clusters, including malignant and non-malignant populations, reinforcing its applicability in tumor settings [Section Results, ‘Unsupervised risk-stratification of metastatic prostate cancer clusters using SCellBOW’].

      Minor issues:

      (1) Labels of subplots within the manu/figure should be revised to ensure correct order (missing Figures 3a-d, 4b before 4a, etc).

      We thank the reviewer for pointing this out. We have corrected the figure labels and ensured that all subplots follow the correct order, aligning with the manuscript.

      (2) "reaffirmed the clinically known aggressiveness order, i.e., CLA >-MES >-PRO, where CLA succeeds the rest of the subtypes in aggressiveness48 (Figures 4c, d)...": "Fig. 4c, d" should be "Fig. 4e, f". Also please put Figure 4a before 4b. Overall the order of Figure 4 needs to be revised to match the order in the manu. Similar to Figure 6.

      We have corrected the figure reference to Fig. 4e, f and revised the order of Figure 4 to maintain consistency with the manuscript.

      (3) "Our results showed that SCellBOW learned latent representation of single-cells accurately captures the 'semantics' associated with cellular phenotypes and allows algebraic operations such as'+' and'-'." Figure 5f (SCellBOW performances on mCRPC) should also be cited here since Supplementary Figure 6 contains three datasets (GBM, BRCA, mCRPC) while in Figure 4 only GBM and BRCA were shown?

      We thank the reviewer for this suggestion. We have now cited Figure 5f in this section to ensure that all datasets, including mCRPC, are appropriately referenced.

      (4) Under the subheading "SCellBOW facilitates survival-risk attribution of tumor subpopulations", the lines start with "We refer to this as phenotype algebra. We utilized this ability to find an association between the embedding vectors, representing total tumor - a specific malignant cell cluster with tumor aggressiveness..." could be reduced a little bit especially the re-intro of phenotype algebra since the author has already discussed previously (under "overview of SCellBOW").

      We appreciate the feedback and have condensed this section to avoid redundancy while maintaining clarity in connecting phenotype algebra to survival-risk attribution.

      (5) "Most CD4+ T cells map to CL0 and CL9 (here, CL is used as an abbreviation for cluster) (Figure 3f)..." "(here, CL is used as an abbreviation for cluster)" this note could be moved forward to SF2 since CL is first introduced in SF2.

      We thank the reviewer for the suggestion. We have moved the definition of CL (cluster) to Supplementary Figure 2 (SF2), where it is first introduced, for improved clarity.

    1. Author response:

      We sincerely thank the editor and both reviewers for their time and thoughtful feedback on our manuscript. We have addressed several of the concerns in the responses below and are currently working on additional analyses to further strengthen the study. These results will be incorporated into the final version of the research paper.

      Reviewer #1 (Public review):

      Summary:

      The authors investigated the population structure of the invasive weed Lantana camara from 36 localities in India using 19,008 genome-wide SNPs obtained through ddRAD sequencing.

      Strengths:<br /> The manuscript is well-written, the analyses are sound, and the figures are of great quality.

      Weaknesses:

      The narrative almost completely ignores the fact that this plant is popular in horticultural trade and the different color morphs that form genetic populations are most likely the result of artificial selection by humans for certain colors for trade, and not the result of natural selfing. Although it may be possible that the genetic clustering of color morphs is maintained in the wild through selfing, there is no evidence in this study to support that. The high levels of homozygosity are more likely explained as a result of artificial selection in horticulture and relatively recent introductions in India. Therefore, the claim of the title that "the population structure.. is shaped by its mating system" is in part moot, because any population structure is in large part shaped by the mating system of the organism, but further misleading because it is much more likely artificial selection that caused the patterns observed.

      The reviewer raises the possibility that the observed genetic patterns may have originated through the selection of different varieties by the horticultural industry. While it is plausible that artificial selection can lead to the formation of distinct morphs, the presence of a strong structure between them in the wild populations cannot be explained just based on selection. In the wild, different flower colour variants frequently occur in close physical proximity and should, in principle, allow for cross-fertilization. Over time, this gene flow would be expected to erode any genetic structure shaped solely by past selection. However, our results show no evidence of such a breakdown in structure. Despite co-occurring in immediate proximity, the flower colour variants maintain distinct genetic identities. This suggests the presence of a barrier to gene flow, likely maintained by the species' mating system. Moreover, the presence of many of these flower colour morphs in the native range—as documented through observations on platforms like iNaturalist—suggests that these variants may have a natural origin rather than being solely products of horticultural selection.

      While it is plausible that horticultural breeding involved efforts to generate new varieties through crossing—resulting in the emergence of some of the observed morphs—even if this were the case, the dynamics of a self-fertilizing species would still lead to rapid genetic structuring. Following hybridization, just a few generations of selfing are sufficient to produce inbred lines, which can then maintain distinct genetic identities. As discussed in our manuscript, such inbred lines could be associated with specific flower colour morphs and persist through predominant self-fertilization. This mechanism provides a compelling explanation for the strong genetic structure observed among co-occurring flower colour variants in the wild.

      While a recent bottleneck may have increased inbreeding, the strong and consistent genetic structuring we observe within populations is more indicative of predominant self-fertilization. To further validate this, we conducted a bagging experiment on Lantana camara inflorescences to exclude insect-mediated cross-pollination. The results showed no significant difference in seed set between bagged and open-pollinated flowers, supporting the conclusion that L. camara is primarily self-fertilizing in India.

      As the reviewer rightly points out, the mating system of a species plays a crucial role in shaping patterns of genetic structure. However, in many natural populations, structuring patterns are often influenced by a combination of factors such as selection, barriers to gene flow, and genetic drift. In some cases, the mating system exerts a more prominent influence at the microgeographic level, while in others, it can shape genetic structure at broader spatial scales. What is particularly interesting in our study is that - the mating system appears to shape genetic structure at a subcontinental scale. Despite the species having undergone other evolutionary forces—such as a genetic bottleneck and expansion due to its invasive nature—the mating system exerts a more pronounced effect on the observed genetic patterns, and the influence of the mating system is remarkably strong, resulting in a clear and consistent genetic structure across populations.

      Reviewer #2 (Public review):

      Summary:

      The authors performed a series of population genetic analyses in Lantana camara using 19,008 genome-wide SNPs data from 359 individuals in India. They found a clear population structure that did not show a geographical pattern, and that flower color was rather associated with population structure. Excess of homozygosity indicates a high selfing rate, which may lead to fixation of alleles in local populations and explain the presence of population structure without a clear geographic pattern. The authors also performed a forward simulation analysis, theoretically confirming that selfing promotes fixation of alleles (higher Fst) and reduction in genetic diversity (lower heterozygosity).

      Strengths:

      Biological invasion is a critical driver of biodiversity loss, and it is important to understand how invasive species adapt to novel environments despite limited genetic diversity (genetic paradox of biological invasion). Lantana camara is one of the hundred most invasive species in the world (IUCN 2000), and the authors collected 359 plants from a wide geographical range in India, where L. camara has invaded. The scale of the dataset and the importance of the target species are the strengths of the present study.

      Weaknesses:

      One of the most critical weaknesses of this study would be that the output modelling analysis is largely qualitative, which cannot be directly comparable to the empirical data. The main findings of the SLiM-based simulation were that selfing promotes the fixation of alleles and the reduction of genetic diversity. These are theoretically well-reported knowledge, and such findings themselves are not novel, although it may have become interesting these findings are quantitatively integrated with their empirical findings in the studied species. In that sense, a coalescent-based analysis such as an Approximate Bayesian Computation method (e.g. DIY-ABC) utilizing their SNPs data would be more interesting. For example, by ABC-based methods, authors can infer the split time between subpopulations identified in this study. If such split time is older than the recorded invasion date, the result supports the scenario that multiple introductions may have contributed to the population structure of this species. In the current form of the manuscript, multiple introductions were implicated but not formally tested.

      Through our SLiM simulations, we aimed to demonstrate that a pattern of strong genetic structure within a location—similar to what we observed in Lantana camara—can arise under a predominantly self-fertilizing mating system. These simulations were not parameterized using species-specific data from Lantana but were intended as a conceptual demonstration of the plausibility of such patterns under selfing using SNP data. While the theoretical consequences of self-fertilisation have been widely discussed, relatively few studies have directly modelled these patterns using SNP data. Our SLiM simulations contribute to this gap and support the notion that the observed genetic structuring in Lantana may indeed result from predominant self-fertilisation.

      We thank the reviewer for the suggestion regarding the use of simulations based on genomic data from Lantana and for explaining the importance of it. We are currently conducting demographic simulations using genomic data from Lantana to estimate divergence times between the different flower colour variants. We believe this analysis will offer deeper insights and provide further clarity on the points raised by the reviewers.

      I also have several concerns regarding the authors' population genetic analyses. First, the authors removed SNPs that were not in Hardy-Weinberg equilibrium (HWE), but the studied populations would not satisfy the assumption of HWE, i.e., random mating, because of a high level of inbreeding. Thus, the first screening of the SNPs would be biased strongly, which may have led to spurious outputs in a series of downstream analyses.

      Hardy-Weinberg Equilibrium (HWE) filtering is a commonly used step in SNP filtering analysis to exclude loci potentially under selection, thereby enriching for neutral variants and minimizing bias in downstream analyses. To ensure that our results are not influenced by selection-driven SNPs, we conducted the analysis both with and without applying the HWE filter. Notably, the number of SNPs retained did not drop significantly after filtering, and the overall patterns observed remained consistent across both approaches.

      Second, in the genetic simulation, it is not clear how a set of parameters such as mutation rate, recombination rate, and growth rate were determined and how they are appropriate. Importantly, while authors assume the selfing rate in the simulation, selfing can also strongly influence the effective mutation rate (e.g. Nordborg & Donnelly 1997 Genetics, Nordborg 2000 Genetics). It is not clear how this effect is incorporated in the simulation.

      The aim of the SLiM simulation was to demonstrate that the extreme genetic structuring observed in Lantana camara can plausibly arise in natural systems under predominant self-fertilization. For the simulation, we used mutation and recombination rates estimated for Arabidopsis thaliana, as these parameters are currently unknown for Lantana. The details of this will be added in the revised version, and thanks to the reviewer for pointing this out. While we acknowledge that this simulation does not provide an exact representation of the species' evolutionary history, the goal of the simulation was not to produce precise estimates but rather to illustrate the feasibility of such strong genetic structuring resulting from self-fertilization alone. The impact of the selfing on the mutation rate is not incorporated in the simulations now. We will look into the details of this.

      Third, while the authors argue the association between flower color and population structure, their statistical associations were not formally tested.

      We recognize that one of the key improvements needed for the manuscript is to provide experimental evidence supporting self-fertilization. To address this, we conducted a bagging experiment on Lantana camara inflorescences to prevent insect visitation and eliminate insect-mediated cross-fertilization. The results showed no significant difference in seed set between bagged and open-pollinated inflorescences, indicating that Lantana is predominantly self-fertilizing in India. This finding is consistent with our genetic data and will be included in the revised version of the manuscript.

      Also, it is not mentioned how flower color polymorphisms are defined. Could it be possible to distinguish many flower color morphs shown in Figure 1b objectively? I am concerned particularly because the authors also mentioned that flower color may change temporally and that a single inflorescence can have flowers of different colors (L160).

      The different flower colour variants are visually distinguishable. Our classification of these variants is not based on the colour of individual flowers at a single time point, but rather on the overall colour change pattern across the inflorescence over time. In other words, the temporal aspect of colour change has been considered in our grouping. For example, in the “yellow-pink” variant, flowers begin as yellow when young and gradually turn pink as they age. Importantly, variants that follow this pattern do not transition to an orange type at any stage, which distinguishes them from other colour types. The varieties that don't change colours are named based on the single flower colour like “orange”.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      The authors present an algorithm and workflow for the inference of developmental trajectories from single-cell data, including a mathematical approach to increase computational efficiency. While such efforts are in principle useful, the absence of benchmarking against synthetic data and a wide range of different single-cell data sets make this study incomplete. Based on what is presented, one can neither ultimately judge if this will be an advance over previous work nor whether the approach will be of general applicability.

      We thank the eLife editor for the valuable feedback. Both benchmarking against other methods and validation on a synthetic dataset (“dyntoy”) are indeed presented in the Supplementary Note, although this was not sufficiently highlighted in the main text, which has now been improved.

      Our manuscript contains benchmarking against a challenging synthetic dataset in Figure 1; furthermore, both the synthetic dataset and the real-world thymus dataset have been analyzed in parallel using currently available TI tools (as detailed in the Supplementary Note). z other single-cell datasets (single-cell RNA-seq) were added in response to the reviewers' comments.

      One of the reviewers correctly points out that tviblindi goes against the philosophy of automated trajectory inference. This is correct; we believe that a new class of methods, complementary to fully automated approaches, is needed to explore datasets with unknown biology. tviblindi is meant to be a representative of this class of methods—a semi-automated framework that builds on features inferred from the data in an unbiased and mathematically well-founded fashion (pseudotime, homology classes, suitable low-dimensional representation), which can be used in concert with expert knowledge to generate hypotheses about the underlying dynamics at an appropriate level of detail for the particular trajectory or biological process.

      We would also like to mention that the algorithm and the workflow are not the sole results of the paper. We have thoroughly characterized human thymocyte development, where, in addition to expected biological endpoints, we found and characterized an unexpected activated thymic T-reg endpoint.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors present tviblindi, a computational workflow for trajectory inference from molecular data at single-cell resolution. The method is based on (i) pseudo-time inference via expecting hitting time, (ii) sampling of random walks in a directed acyclic k-NN where edges are oriented away from a cell of origin w.r.t. the involved nodes' expected hitting times, and (iii) clustering of the random walks via persistent homology. An extended use case on mass cytometry data shows that tviblindi can be used elucidate the biology of T cell development.

      Strengths:

      - Overall, the paper is very well written and most (but not all, see below) steps of the tviblindi algorithm are explained well.

      - The T cell biology use case is convincing (at least to me: I'm not an immunologist, only a bioinformatician with a strong interest in immunology).

      We thank the reviewer for feedback and suggestions that we will accommodate, we respond point-by-point below

      Weaknesses:

      - The main weakness of the paper is that a systematic comparison of tviblindi against other tools for trajectory inference (there are many) is entirely missing. Even though I really like the algorithmic approach underlying tviblindi, I would therefore not recommend to our wet-lab collaborators that they should use tviblindi to analyze their data. The only validation in the manuscript is the T cell development use case. Although this use case is convincing, it does not suffice for showing that the algorithms's results are systematically trustworthy and more meaningful (at least in some dimension) than trajectories inferred with one of the many existing methods.

      We have compared tviblindi to several trajectory inference methods (Supplementary note section 8.2: Comparison to state-of-the-art methods, namely Monocle3 (v1.3.1) Cao et al. (2019), Stream (v1.1) Chen et al. (2019), Palantir (v1.0.0) Setty et al. (2019), VIA (v0.1.89) Stassen et al. (2021), StaVia (Via 2.0) Stassen et al. (2024), CellRank 2 (v2.06) Weiler et al. (2024)  and PAGA (scanpy==1.9.3) Wolf et al. (2019). We added thorough and systematic comparisons to the other algorithms mentioned by reviewers. We included extended evaluation on publicly available datasets (Supplementary Note section 10).

      Also, in the meantime we have successfully used tviblindi to investigate human B-cell development in primary immunodeficiency (Bakardjieva M, et al. Tviblindi algorithm identifies branching developmental trajectories of human B-cell development and describes abnormalities in RAG-1 and WAS patients. Eur J Immunol. 2024 Dec;54(12):e2451004. doi: 10.1002/eji.202451004.).

      - The authors' explanation of the random walk clustering via persistent homology in the Results (subsection "Real-time topological interactive clustering") is not detailed enough, essentially only concept dropping. What does "sparse regions" mean here and what does it mean that "persistent homology" is used? The authors should try to better describe this step such that the reader has a chance to get an intuition how the random walk clustering actually works. This is especially important because the selection of sparse regions is done interactively. Therefore, it's crucial that the users understand how this selection affects the results. For this, the authors must manage to provide a better intuition of the maths behind clustering of random walks via persistent homology.

      In order to satisfy both reader types: the biologist and the mathematician, we explain the mathematics in detail in the Supplementary Note, section 4. We improved the Results text to better point the reader to the mathematical foundations in the Supplementary Note.  

      - To motivate their work, the authors write in the introduction that "TI methods often use multiple steps of dimensionality reduction and/or clustering, inadvertently introducing bias. The choice of hyperparameters also fixes the a priori resolution in a way that is difficult to predict." They claim that tviblindi is better than the original methods because "analysis is performed in the original high-dimensional space, avoiding artifacts of dimensionality reduction." However, in the manuscript, tviblindi is tested only on mass cytometry data which has a much lower dimensionality than scRNA-seq data for which most existing trajectory inference methods are designed. Since tviblindi works on a k-NN graph representation of the input data, it is unclear if it could be run on scRNA-seq data without prior dimensionality reduction. For this, cell-cell distances would have to be computed in the original high-dimensional space, which is problematic due to the very high dimensionality of scRNA-seq data. Of course, the authors could explicitly reduce the scope of tviblindi to data of lower dimensionality, but this would have to be stated explicitly.

      In the manuscript we tested the framework on the scRNA-seq data from Park et al 2020 (DOI: 10.1126/science.aay3224). To illustrate that tviblindi can work directly in the high-dimensional space, we applied the framework successfully on imputed 2000 dimensional data. Furthermore we successfully used tviblindi to investigate bone marrow atlas scRNA-Seq dataset Zhang et al. (2024) and atlas of mouse gastrulation Pijuan-Sala et al. (2019). The idea behind tviblindi is to be able to work without the necessity to use non-linear dimensionality reduction techniques, which reduce the dimensionality to a very low number of dimensions and whose effects on the data distribution are difficult to predict. On the other hand the use of (linear) dimensionality reduction techniques which effectively suppress noise in the data such as PCA is a good practice (see also response to reviewer 2). We have emphasized this in the revised version and added the results of the corresponding analysis (see Supplementary note, section 9).

      - Also tviblindi has at least one hyper-parameter, the number k used to construct the k-NN graphs (there are probably more hidden in the algorithm's subroutines). I did not find a systematic evaluation of the effect of this hyper-parameter.

      Detailed discussion of the topic is presented in the Supplementary Note, section 8.1, where Spearman correlation coefficient between pseudotime estimated using k=10 and k=50 nearest neighbors was 0.997.   The number k however does affect the number of candidate endpoints. But even when larger k causes spurious connection between unrelated cell fates, the topological clustering of random walks allows for the separation of different trajectories. We have expanded the “sensitivity to hyperparameters” section 8.1 also in response to reviewer 2.

      Reviewer #2 (Public Review):

      Summary:

      In Deconstructing Complexity: A Computational Topology Approach to Trajectory Inference in the Human Thymus with tviblindi, Stuchly et al. propose a new trajectory inference algorithm called tviblindi and a visualization algorithm called vaevictis for single-cell data. The paper utilizes novel and exciting ideas from computational topology coupled with random walk simulations to align single cells onto a continuum. The authors validate the utility of their approach largely using simulated data and establish known protein expression dynamics along CD4/CD8 T cell development in thymus using mass cytometry data. The authors also apply their method to track Treg development in single-cell RNA-sequencing data of human thymus.

      The technical crux of the method is as follows: The authors provide an interactive tool to align single cells along a continuum axis. The method uses expected hitting time (given a user input start cell) to obtain a pseudotime alignment of cells. The pseudotime gives an orientation/direction for each cell, which is then used to simulate random walks. The random walks are then arranged/clustered based on the sparse region in the data they navigate using persistent homology.

      We thank the reviewer for feedback and suggestions that we have accommodated, we responded point-by-point below.

      Strengths:

      The notion of using persistent homology to group random walks to identify trajectories in the data is novel.

      The strength of the method lies in the implementation details that make computationally demanding ideas such as persistent homology more tractable for large scale single-cell data. This enables the authors to make the method more user friendly and interactive allowing real-time user query with the data.

      Weaknesses:

      The interactive nature of the tool is also a weakness, by allowing for user bias leading to possible overfitting for a specific data.

      tviblindi is not designed as a fully automated TI tool (although it implements a fully automated module), but as a data driven framework for exploratory analysis of unknown data. There is always a risk of possible bias in this type of analysis - starting with experimental design, choice of hyperparameters in the downstream analysis, and an expert interpretation of the results. The successful analysis of new biological data involves a great deal of expert knowledge which is difficult to a priori include in the computational models. 

      tvilblindi tries to solve this challenge by intentionally overfitting the data and keeping the level of resolution on a single random walk. In this way we aim to capture all putative local relationships in the data. The on-demand aggregation of the walks using the global topology of the data allows researchers to use their expert knowledge to choose the right level of detail (as demonstrated in the Figure 4 of the manuscript) while relying on the topological structure of the high dimensional point cloud. At all times tviblindi allows to inspect the composition of the trajectory to assess the variance in the development, possible hubs on the KNN-graph etc.

      The main weakness of the method is lack of benchmarking the method on real data and comparison to other methods. Trajectory inference is a very crowded field with many highly successful and widely used algorithms, the two most relevant ones (closest to this manuscript) are not only not benchmarked against, but also not sited. Including those that specifically use persistent homology to discover trajectories (Rizvi et.al. published Nat Biotech 2017). Including those that specifically implement the idea of simulating random walks to identify stable states in single-cell data (e.g. CellRank published in Lange et.al Nat Meth 2022), as well as many trajectory algorithms that take alternative approaches. The paper has much less benchmarking, demonstration on real data and comparison to the very many other previous trajectory algorithms published before it. Generally speaking, in a crowded field of previously published trajectory methods, I do not think this one approach will compete well against prior work (especially due to its inability to handle the noise typical in real world data (as was even demonstrated in the little bit of application to real world data provided).

      We provided comparisons of tviblindi and vaevictis in the Supplementary Note, section 8.2, where we compare it to Monocle3 (v1.3.1) Cao et al. (2019), Stream (v1.1) Chen et al. (2019), Palantir (v1.0.0) Setty et al. (2019), VIA (v0.1.89) Stassen et al. (2021),  StaVia (Via 2.0) Stassen et al. (2024), CellRank 2 (v2.06) Weiler et al. (2024)  and PAGA (scanpy==1.9.3) Wolf et al. (2019). We added thorough and systematic comparisons to the other algorithms mentioned by reviewers. We included extended evaluation on publicly available datasets (Supplementary Note section 10).

      Beyond general lack of benchmarking there are two issues that give me particular concern. As previously mentioned, the algorithm is highly susceptible to user bias and overfitting. The paper gives the example (Figure 4) of a trajectory which mistakenly shows that cells may pass from an apoptotic phase to a different developmental stage. To circumvent this mistake, the authors propose the interactive version of tviblindi that allows users to zoom in (increase resolution) and identify that there are in fact two trajectories in one. In this case, the authors show how the author can fix a mistake when the answer is known. However, the point of trajectory inference is to discover the unknown. With so much interactive options for the user to guide the result, the method is more user/bias driven than data-driven. So a rigorous and quantitative discussion of robustness of the method, as well as how to ensure data-driven inference and avoid over-fitting would be useful.

      Local directionality in expression data is a challenge which is not, to our knowledge, solved. And we are not sure it can be solved entirely, even theoretically. The random walks passing “through” the apoptotic phase are biologically infeasible, but it is an (unbiased) representation of what the data look like based on the diffusion model. It is a property of the data (or of the panel design), which has to be interpreted properly rather than a mistake. Of note, except for Monocle3 (which does not provide the directionality) other tested methods did not discover this trajectory at all.

      The “zoom in” has in fact nothing to do with “passing through the apoptosis”. We show how the researcher can investigate the suggested trajectory to see if there is an additional structure of interest and/or relevance. This investigation is still data driven (although not fully automated). Anecdotally in this particular case this branching was discovered by a bioinformatician, who knew nothing about the presence of beta-selection in the data.  

      We show that the trajectory of apoptosis of cortical thymocytes consists of 2 trajectories corresponding to 2 different checkpoints (beta-selection and positive/negative selection). This type of a structure, where 2 (or more) trajectories share the same path for most of the time, then diverge only to be connected at a later moment (immediately from the point of view of the beta-selection failure trajectory) is a challenge for TI algorithms and none of tested methods gave a correct result. More importantly there seems to be no clear way to focus on these kinds of structures (common origin and common fate) in TI methods.

      Of note, the “zoom in” is a recommended and convenient method to look for an inner structure, but it does not necessarily mean addition of further homological classes. Indeed, in this case the reason that the structure is not visible directly is the limitation of the dendrogram complexity (only branches containing at least 10% of simulated random walks are shown by default). In summary, tviblindi effectively handled all noise in the data that obscured biologically valid trajectories for other methods. We have improved the discussion of the robustness in the current version.  

      Second, the paper discusses the benefit of tviblindi operating in the original high dimensions of the data. This is perhaps adequate for mass cytometry data where there is less of an issue of dropouts and the proteins may be chosen to be large independent. But in the context of single-cell RNA-sequencing data, the massive undersampling of mRNA, as well as high degree of noise (e.g. ambient RNA), introduces very large degree of noise so that modeling data in the original high dimensions leads to methods being fit to the noise. Therefore ALL other methods for trajectory inference work in a lower dimension, for very good reason, otherwise one is learning noise rather than signal. It would be great to have a discussion on the feasibility of the method as is for such noisy data and provide users with guidance. We note that the example scRNA-seq data included in the paper is denoised using imputation, which will likely result in the trajectory inference being oversmoothed as well.

      We agree with the reviewer. In our manuscript we wanted to showcase that tviblindi can directly operate in high-dimensional space (thousands of dimensions) and we used MAGIC imputation for this purpose. This was not ideal. More standard approach, which uses 30-50 PCs as input to the algorithm resulted in equivalent trajectories. We have added this analysis to the study (Supplementary note, section 9).

      In summary, the fact that tviblindi scales well with dimensionality of the data and is able to work in the original space does not mean that it is always the best option. We have added a corresponding comment into the Supplementary note.  

      Reviewer #3 (Public Review):

      Summary:

      Stuchly et al. proposed a single-cell trajectory inference tool, tviblindi, which was built on a sequential implementation of the k-nearest neighbor graph, random walk, persistent homology and clustering, and interactive visualization. The paper was organized around the detailed illustration of the usage and interpretation of results through the human thymus system.

      Strengths:

      Overall, I found the paper and method to be practical and needed in the field. Especially the in-depth, step-by-step demonstration of the application of tviblindi in numerous T cell development trajectories and how to interpret and validate the findings can be a template for many basic science and disease-related studies. The videos are also very helpful in showcasing how the tool works.

      Weaknesses:

      I only have a few minor suggestions that hopefully can make the paper easier to follow and the advantage of the method to be more convincing.

      (1) The "Computational method for the TI and interrogation - tviblindi" subsection under the Results is a little hard to follow without having a thorough understanding of the tviblindi algorithm procedures. I would suggest that the authors discuss the uniqueness and advantages of the tool after the detailed introduction of the method (moving it after the "Connectome - a fully automated pipeline".

      We thank the reviewer for the suggestion and we have accommodated it to improve readability of the text.

      Also, considering it is a computational tool paper, inevitably, readers are curious about how it functions compared to other popular trajectory inference approaches. I did not find any formal discussion until almost the end of the supplementary note (even that is not cited anywhere in the main text). Authors may consider improving the summary of the advantages of tviblindi by incorporating concrete quantitative comparisons with other trajectory tools.

      We provided comparisons of tviblindi and vaevictis in the Supplementary Note, section 8.2, where we compare it to Monocle3 (v1.3.1) Cao et al. (2019), Stream (v1.1) Chen et al. (2019), Palantir (v1.0.0) Setty et al. (2019), VIA (v0.1.89) Stassen et al. (2021),  StaVia (Via 2.0) Stassen et al. (2024), CellRank 2 (v2.06) Weiler et al. (2024)  and PAGA (scanpy==1.9.3) Wolf et al. (2019). We added thorough and systematic comparisons to the other algorithms mentioned by reviewers. We included extended evaluation on publicly available datasets (Supplementary Note section 10).

      (2) Regarding the discussion in Figure 4 the trajectory goes through the apoptotic stage and reconnects back to the canonical trajectory with counterintuitive directionality, it can be a checkpoint as authors interpret using their expert knowledge, or maybe a false discovery of the tool. Maybe authors can consider running other algorithms on those cells and see which tracks they identify and if the directionality matches with the tviblindi.

      We have indeed used the thymus dataset for comparison of all TI algorithms listed above. Except for Monocle 3 they failed to discover the negative selection branch (Monocle 3 does not offer directionality information). Therefore, a valid topological trajectory with incorrect (expert-corrected) directionality was partly or entirely missed by other algorithms. 

      (3) The paper mainly focused on mass cytometry data and had a brief discussion on scRNA-seq. Can the tool be applied to multimodality data such as CITE-seq data that have both protein markers and gene expression? Any suggestions if users want to adapt to scATAC-seq or other epigenomic data?

      The analysis of multimodal data is the logical next step and is the topic of our current research. At this moment tviblindi cannot be applied directly to multimodal data. It is possible to use the KNN-graph based on multimodal data (such as weighted nearest neighbor graph implemented in Seurat) for pseudotime calculation and random walk simulation. However, we do not have a fully developed triangulation for the multimodal case yet. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data or analyses:

      -  Benchmark against existing trajectory inference methods.

      -  Benchmark on scRNA-seq data or an explicit statement that, unlike existing methods, tviblindi is not designed for such data.

      We provided comparisons of tviblindi and vaevictis in the Supplementary Note, section 8.2, where we compare it to Monocle3 (v1.3.1) Cao et al. (2019), Stream (v1.1) Chen et al. (2019), Palantir (v1.0.0) Setty et al. (2019), VIA (v0.1.89) Stassen et al. (2021),  StaVia (Via 2.0) Stassen et al. (2024), CellRank 2 (v2.06) Weiler et al. (2024)  and PAGA (scanpy==1.9.3) Wolf et al. (2019). We added thorough and systematic comparisons to the other algorithms mentioned by reviewers. We included extended evaluation on publicly available datasets (Supplementary Note section 10).

      -  Systematic evaluation of the effetcs of hyper-parameters on the performance of tviblindi (as mentioned above, there is at least one hyper-parameter, the number k to construct the k-NN graphs).

      This is described in Supplementary Note section 8.1

      Recommendations for improving the writing and presentation:

      -  The GitHub link to the algorithm which is currently hidden in the Methods should be moved to the abstract and/or a dedicated section on code availability.

      -  The presentation of the persistent homology approach used for random walk clustering should be improved (see public comment above).

      This is described extensively in Supplementary Note  

      -  A very minor point (can be ignored by the authors): consider renaming the algorithm. At least for me, it's extremely difficult to remember.

      We choose to keep the original name

      Minor corrections to the text and figures:

      -  Labels and legend texts are too small in almost all figures.

      Reviewer #2 (Recommendations For The Authors):  

      (1) On page 3: "(2) Analysis is performed in the original high-dimensional space avoiding artifacts of dimensionality reduction." In mass cytometry data where there is no issue of dropouts, one may choose proteins such that they are not correlated with each other making dimensionality reduction techniques less relevant. But in the context of an unbiased assays such as single-cell RNA-sequencing (scRNA-seq), one measures all the genes in a cell so dimensionality reduction can help resolve the redundancy in the feature space due to correlated/co-regulated gene expression patterns. This assumption forms the basis of most methods in scRNA-seq. More importantly, in scRNA-seq data the dropouts and ambient molecules in mRNA counts result in so much noise that modeling cells in the full gene expression is highly problematic. So the authors are requested to discuss in detail how they would propose to deal with noise in scRNA-seq data.

      On this note, the authors mention in Supplementary Note 9 (Analysis of human thymus single-cell RNA-seq data): "Imputed data are used as the input for the trajectory inference, scaled counts (no imputation) are shown in line plots". The line plots indicate the gene expression trends along the obtained pseudotime. The authors use MAGIC to impute the data, and we request the authors to mention this in the Methods section (currently one must look through the code on Supplementary Note 1.3 to find this). Data imputation in single-cell RNA-seq data are intended to enable quantification of individual gene expression distribution or pairwise gene associations. But when all the genes in an imputed data are used for visualization, clustering or trajectory inference, the averaging effect will compound and result in severely smoothed data that misses important differences between cell states. Especially, in the case of MAGIC, which uses a transition matrix raised to a power, it is over-smoothing of the data to use a transition matrix smoothed data to obtain another transition matrix to calculate the hitting time (or simulate random walks). Second, the authors' proposal to use scaled counts to study gene trends cannot be generalized to other settings due to drop out issue. Given the few genes (and only one branch) that are highlighted in Figure 7D-G and Figure 31 in Supplementary Note, it is hard to say if scaling raw values would pick up meaningful biology robustly here for other branches.

      We recommend that this data be reanalyzed with non-imputed data used for trajectory inference and imputed gene expression used for line plots.

      As stated above in the public review, we reanalyzed the scRNA Seq data using a more standard approach (first 50 principal components). We have also analyzed two additional scRNA Seq datasets (Section 1 and section 10 of Supplementary Note)

      On the same note, the authors use Seurat's CellCycleScoring to obtain the cell cycle phase of each cell and later use ScaleData to regress them out. While we agree that it is valuable to remove cell cycle effect from the data for trajectory inference (and has been used previously in other methods), the regression approach employed in Seurat's ScaleData is not appropriate. It is an aggressive approach that severely changes expression pattern of many genes and can result in new artifacts (false positives) in the data. We recommend the authors to explore this more and consider using a more principled alternatives such as fscLVM (https://genomebiology.biomedcentral.com/articles/10.1186/s13059-017-1334-8). 

      Cell cycle correction is an open problem (Heumos, Nat Rev Genetics, 2023)

      Here we use an (arguably aggressive) approach to make the presentation more straightforward. The cells we are interested here (end #6) are not dividing and the regression does not change the conclusion drawn in the paper

      (2) The figures provided are extremely low in resolution that it is practically impossible to correctly interpret a lot of the conclusion and references made in the figure (especially Figure 3 in the main text).

      Resolution of the Figures was improved

      (3) There are many aspects of the method that enable easy user biases and can lead to substantial overfitting of the data.

      a. On page 7: "The topology of the point cloud representing human T-cell development is more complex ... and does not offer a clear cutoff for the choice of significant sparse regions. Interactive selection allows the user to vary the resolution and to investigate specific sparse regions in the data iteratively." This implies that the method enables user biases to be introduced into the data analysis. While perhaps useful for exploration, quantitative trajectory assessment using such approach can be faulty when the user (A) may not know the underlying dynamics (B) forces preconceived notion of trajectory.

      The authors should consider making the trajectory inference approach less dependent on interactive user input and show that the trajectory results are robust to any choices the user may make. It may also help if the authors provide an effective guide and mention clearly what issues could result due to the use of such thresholds.

      As explained in the response in public reviews, tviblindi is not designed as a fully automated TI tool, but as a data driven framework for exploratory analysis of unknown data. 

      There is always a risk of possible bias in this type of analysis - starting with experimental design, choice of hyperparameters in the downstream analysis, and an expert interpretation of the results. The successful analysis of new biological data involves a great deal of expert knowledge which is difficult to a priori include in the computational models.  To specifically address the points raised by the reviewer:

      “(A) may not know the underlying dynamics” - tviblindi is designed to perform exploratory analysis of the unknown underlying dynamics. We showcase in the study how this can be performed and we highlight possible cases which can be resolved expertly (spurious connections (doublets), different scales of resolution (beta selection)). Crucially, compared to other TI methods, tviblindi offers a clear mechanism on how to discover, focus and resolve these issues which would (and do) contaminate the trajectories discovered fully automatically by tested methods (cf. the beta selection, or the development of plasmacytoid dendritic cells (PDCs) (Supplementary note, section 10.1).

      “(B) forces preconceived notion of trajectory” - user interaction in tviblindi does not force a preconceived notion of the trajectory. The random walks are simulated before the interactive step in an unbiased manner. During the interactive step the user adjusts trajectory specific resolution - incorrect choice of the resolution may result in either merging distinct trajectories into one or over separating the trajectories (which is arguably much less serious). However the interactive step is designed to deal with exactly this kind of challenge. We showcase (e.g. beta selection, or PDCs development) how to address the issue - tviblindi allows us to investigate deeper structure in any considered trajectory.

      Thus, tviblindi represents a new class of methods that is complementary to fully automated trajectory inference tools. It offers a semi-automated tool that leverages features derived from data in an unbiased and mathematically rigorous manner, including pseudotime, homology classes, and appropriate low-dimensional representations. These can be integrated with expert knowledge to formulate hypotheses regarding the underlying dynamics, tailored to the specific trajectory or biological process under investigation.

      b. In Figure 4, the authors discuss the trajectory of cells emanating from CD3 negative double positive stage and entering apoptotic phase and mention tviblindi may give "the false impression that cells may pass through an apoptotic phase into a later developmental stage" and propose that the interactive version of tviblindi can help user zoom into (increase resolution) this phenomenon and identify that there are in fact two trajectories in one. Given this, how do the other trajectories in the data change if a user manually adjusts the resolution? A quantification of the robustness is important. Also, it appears that a more careful data clean up could avoid such pitfalls where the algorithm infers trajectory based on mixed phenotype and the user would not have to manually adjust the resolution to obtain clear biological conclusion. We not that the original publication of this data did such "data clean up" using simple diffusion map based dimensionality reduction which the authors boast they avoid. There is a reason for this dimensionality reduction (distinguishing signal from noise), even in CyTOF data, let alone its importance in single cell data.

      The reviewer is concerned about two different, but intertwined issues we wish to untangle here. First, data clean-up is typically done on the premise that dead cells are irrelevant and they are a source of false signals. In the case of the thymocytes in the human thymus this premise is not true. Apoptotic cells are a legitimate (actually dominant) fate of the development and thus need to be represented in the TI dataset. Their biological behavior is however complex as they stop expressing proteins and thus lose their surface markers gradually, as dictated by the particular protein degradation kinetics. So can we clean up dead and dying cells better? Yes, but we don't want to do it since we would lose cells we want to analyze. Second, do trajectories change when we zoom into the data? No, only the level of detail presented visually changes. Since we calculate 5000 trajectories in the dataset, we need to aggregate them already for the hierarchical clustering visualization. Note that Figure 4, panel A highlights 159 trajectories selected in V. group. Zooming in means that the hierarchy of trajectories within V. group is revealed (panel D, groups V.a and Vb.) and can be interpreted on the vaevictis and lineplot graphs (panel E, F). 

      c. In the discussion, the authors write "[tviblindi] allows the selection and grouping of similar random walks into trajectories based on visual interaction with the data". This counters the idea of automated trajectory inference and can lead to severe overfitting.

      As explained in reply to Q3, our aim was NOT to create a fully automated trajectory inference tool. Even more, in our experience we realized that all current tools are taking this fully  automated approach with a search for an “ideal” set of hyperparameters. This, in our experience,  leads to a “blackbox” tool that is difficult to interpret for the expert in the biological field. To respond to this need we designed a modular approach where the results of the TI are presented and the expert can interact with them to focus the visualization and to derive interpretation. Our interactive concept is based on 15 years of experience with the data analysis in flow cytometry, where neither manual gating nor full automation is the ultimate solution but smart integration of both approaches eventually wins the game.

      Thus, tviblindi represents a new class of methods that is complementary to fully automated trajectory inference tools.  It offers a semi-automated tool that leverages features derived from data in an unbiased and mathematically rigorous manner. These features include pseudotime, homology classes, and appropriate low-dimensional representations. These features can be integrated with expert knowledge to formulate hypotheses regarding the underlying dynamics, tailored to the specific trajectory or biological process under investigation.

      d. The authors provide some comment on the robustness to the relaxation parameter for witness complex construction in Supplementary Note Section 8.1.2 but it is limited given the importance of this parameter and a more thorough investigation is recommended. We request the authors to provide concrete examples with figures of how changing alpha2 parameter leads to simplicial complexes of different sizes and an assessment of contexts in which the parameter is robust and when not (in both simulated and publicly available real data). Of note, giving the users a proper guide for parameter choice based on these examples and offering them ways to quantify robustness of their results may also be valuable.

      Section 8 in Supplementary Note was extended as requested.

      e. The authors are requested for an assessment of possible short-circuits (e.g. cells of two distantly related phenotypes that get connected erroneously in the trajectory) in the data, and how their approach based on persistent homology deals with it.

      If a short circuit results in a (spurious) alternative trajectory, the persistent homology approach allows us to distinguish it from genuine trajectories that do not follow the short circuit. This prevents contamination of the inferred evolution by erroneous connections. The ability to distinguish and separate distinct trajectories with the same fate is a major strength of this approach (e.g., the trajectory through doublets or the trajectories around checkpoints in thymocytes’ evolution).

      (4) The authors propose vaevictis as a new visualization tool and show its performance compared to the standard UMAP algorithm on a simulated data set (Figure 1 in Supplementary Notes). We recommend a more comprehensive comparison between the two algorithms on a wide array of publicly available single-cell datasets. As well as comparison to other popular dimensionality reduction approaches like force directed layouts, which are the most widely used tool specifically to visualize trajectories.

      We added Section 10 to Supplementary Note that presents multiple comparisons of this kind. It is important to note that tviblindi works independently of visualization and any preferred visualization can be used in the interactive phase (multiple visualisation methods are implemented).

      (5) In Supplementary Note 8.2, the authors compare tviblindi against the other methods. We recommend the authors to quantify the comparison or expand on their assesments in real biological data. For example, in comparison against Palantir and VIA the authors mention "... discovers candidate endpoints in the biological dataset but lacks toolbox to interrogate subtle features such as complex branching" and "fails to discover subtle features (such as Beta selection)" respectively. We recommend the authors to make these comparisons more precise or provide quantification. While the added benefit of interactive sessions of tviblindi may make it more user friendly, the way tviblindi appears to enable analysis of subtle features (e.g. Figure 1H) should be possible in Palantir or VIA as well.

      We extended the comparisons and presented them in Section 8 and 10 in Supplementary Note.  

      (6) The notion of using random walk simulations to identify terminal (and initial states) has been previously used in single-cell data (CellRank algorithm: https://www.nature.com/articles/s41592-021-01346-6). We request the authors to compare their approach to CellRank.

      We compared our algorithm to the CellRank successor CellRank 2 (see section 8.2, Supplementary Note)

      (7) The notion of using persistent homology to discover trajectories has been previously used in single cell data https://pubmed.ncbi.nlm.nih.gov/28459448/. we request a comparison to this approach

      The proposed algorithm was not able to accommodate the large datasets we used.

      scTDA (Rizvi, Camara et al. Nat. Biotechnol. 2017) has not been updated for 6 years. It is not suited for complex atlas-sized datasets both in terms of performance and utility, with its limited visualization tools. It also lacks capabilities to analyze individual trajectories.

      (8) In Figure 3B, the authors visualize the endpoints and simulated random walks using the connectome. There is no edge from start to the apoptotic cells here. It is not clear why? If they are not relevant based on random walks, can the user remove them from analysis? Same for the small group of pink cells below initial point.

      The connectome is a fully automated approach (similar to PAGA) which gives a basic overview of the data. It is not expected to be able to compete with the interactive pipeline of tviblindi for the same reasons as the fully automated methods (difficult to predict the effect of hyperparameters).

      (9) In Supplementary Figure 3, in relation to "Variants of trajectories including selection processes" the author mention that there is a spurious connection between CD4 single positive, and the doublet set of cells. The authors mention that the presence of dividing cells makes it difficult to remove the doublets. We request the authors to discuss why. For example, the authors seem to have cell cycle markers (e.g. Ki67, pH3, Cyclin) and one would think that coupled with DNA intercalator 191/193lr one could further clean-up the data. Can the authors employ alternative toolkits such as doublet detection methods?

      To address this issue, we do remove doublets with illegitimate cell barcodes (e.g. we remove any two cells from two samples with different barcode which present with double barcode). Although there are computational doublet removal approaches for mass cytometry (Bagwell, Cytometry A 2020), mostly applied to peripheral blood samples (where cell division is not present under steady state immune system conditions), these are however not well suited for situations where dividing samples occur (Rybakowska P, Comput Struct Biotechnol J. 2021), which is the case of our thymocyte samples. Furthermore, there are other situations where doublet formation is not an accident, but rather a biological response (Burel JG, Cytometry A (2020). Thus, the doublet cell problem is similar to the apoptotic cell problem discussed earlier.

      We could remove cells with the double DNA signal, but this would remove not only accidental doublets but also the legitimate (dividing) cells. So the question is how to remove the illegitimate doublets but not the legitimate?

      Of note, the trajectory going through doublets does not affect the interpretation of other trajectories as it is readily discriminated by persistent homology and thus random walks passing through this (spurious) trajectory do not contaminate the markers’ evolution inferred for legitimate trajectories.

      We therefore prefer to remove only the barcode illegitimate and keep all others in analysis, using the expert analysis step also to identify (using the cell cycle markers plus other features) the artificially formed doublets and thus spurious connections.

      (10) The authors should discuss how the gene expression trend plots are made (e.g. how are the expression averaged? Rolling mean?).

      The development of those markers is shown as a line plot connecting the average values of a specific marker within a pseudotime segment. By default, the pseudotime values are divided into uniform segments (each containing the same number of points) whose number can be changed in the GUI. To focus on either early or late stages of the development, the segment division can be adjusted in GUI. See section 6 of the Supplementary Note.

      Reviewer #3 (Recommendations For The Authors):

      The overall figures quality needs to be improved. For example, I can barely see the text in Figure 3c.

      Resolution of the Figures was improved

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work done by Huang et.al. revealed the complex regulatory functions and transcription network of 172 unknown transcription factors of Pseudomonas aeruginosa PAO1. The authors utilized ChIP-seq to profile TFs binding site information across the genome, demonstrating diverse regulatory relationships among them via hierarchical networks with three levels. They further constructed thirteen ternary regulatory motifs in small subs and co-association atlas with 7 core associated clusters. The study also uncovered 24 virulence-related master regulators. The pan-genome analysis uncovered both the conservation and evolution of TFs with P. aeruginosa complex and related species. Furthermore, they established a web-based database combining both existing and novel data from HT-SELEX and ChIP-seq to provide TF binding site information. This study offered valuable insights into studying transcription regulatory networks in P. aeruginosa and other microbes.

      Strengths:

      The results are presented with clarity, supported by well-organized figures and tables that not only illustrate the study's findings but also enhance the understanding of complex data patterns.

      Thank you for your valuable feedback on our paper exploring the transcription regulatory networks in P. aeruginosa.

      Weaknesses:

      The results of this manuscript are mainly presented in systematic figures and tables. Some of the results need to be discussed as an illustration how readers can utilize these datasets.

      We appreciate the valuable suggestion about enhancing the practical aspects of our manuscript. We have expanded the discussion section to include more detailed explanations of how these datasets can be utilized in practical applications. 

      Reviewer #2 (Public review):

      In this work, the authors comprehensively describe the transcriptional regulatory network of Pseudomonas aeruginosa through the analysis of transcription factor binding characteristics. They reveal the hierarchical structure of the network through ChIP-seq, categorizing transcription factors into top-, middle-, and bottom-level, and reveal a diverse set of relationships among the transcription factors. Additionally, the authors conduct a pangenome analysis across the Pseudomonas aeruginosa species complex as well as other species to study the evolution of transcription factors. Moreover, the authors present a database with new and existing data to enable the storage and search of transcription factor binding sites. The findings of this study broaden our knowledge on the transcriptome of P. aeruginosa. This study sheds light on the complex interconnections between various cellular functions that contribute to the pathogenicity of P. aeruginosa, along with the associated regulatory mechanisms. Certain findings, such as the regulatory tendencies of DNA-binding domain-types, provides valuable insights on the possible functions of uncharacterized transcription factors and new functions of those that have already been characterized. The techniques used hold great potential for discovery of transcription factor functions in understudied organisms as well.

      The study would benefit from a more clear discussion on the implications of various findings, such as binding preferences, regulatory preferences, and the link between regulatory crosstalk and virulence. Additionally, the pangenome analysis would be furthered through a discussion of the divergence of the transcription factors of P. aeruginosa PAO1 across species in relation to the findings on the hierarchical structure of the transcriptional regulatory network.

      Thank you for your positive feedback and suggestions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major:

      (1) It appears that many TFs are conserved among bacteria, archaebacteria, fungi, plants, and animals. Does this mean these TFs in bacterial could be the ancestors of TFs in fungi, plants, and animals? If we fetch these TFs out and build an evolutionary tree, can we visual the three kingdoms as well?

      Thank you for this comment. While many TFs are conserved across bacteria, archaea, fungi, plants, and animals, this conservation does not necessarily imply a direct ancestral relationship. Instead, it may reflect the fundamental importance of certain domains and regulatory mechanisms, which could have arisen from a common ancestral system or through convergent evolution. If we fetch TF PA2032 out to build an evolutionary tree by setting PAO1 as the root, we can visualize these kingdoms in a tree. We added this content in the revised manuscript. Please see Figure S7D and Lines 404-411.

      “The phylogenetic tree of PA2032 across bacteria, archaea, fungi, plants, and animals, with PAO1 as the root revealed that the bacterial TFs (purple) indicates a high degree of conservation within prokaryotes, suggesting a fundamental role in core regulatory processes. In contrast, eukaryotic TFs (fungi, plants, and animals) form distinct clades with longer branch lengths, indicating significant divergence and specialization during eukaryotic evolution. These findings suggest that while TF is conserved across domains of life, its functional roles and regulatory mechanisms have undergone substantial diversification in eukaryotes.”

      (2) Can the authors give an indication how could we employ the findings of this study in designing next generation of antimicrobial agents?

      Thank you for this important suggestion. We have provided this content in the discussion part. Please see Lines 481-492.

      “The extensive datasets generated in this study offer valuable insights into understanding and targeting P. aeruginosa pathogenicity. The genome-wide binding profiles can be systematically analyzed through our hierarchical regulatory network framework to decode complex virulence mechanisms. The virulence-related master regulators and core regulatory clusters identified in this study highlighted key nodes of transcriptional control. Understanding these regulatory relationships is particularly valuable for identifying targets whose modulation would significantly impact virulence while accounting for potential compensatory mechanisms. This knowledge base thus provides a foundation for developing targeted approaches to combat P. aeruginosa infections, moving beyond traditional antibiotic strategies toward more sophisticated interventions based on regulatory network manipulation.”

      Minor:

      (1) Lines 178-180: It would strengthen the discussion to include a few additional references that support the claims made in this section, providing a more comprehensive context for the readers.

      Yes. We have added more citations(1-5) (No. 1-5 in the references at the end of the rebuttal) to support the claims. Please see Line 182.

      (2) Line 198: You mention 'seven' motifs containing toggle switches, but Fig.3 actually displays eight motifs. Please revise this discrepancy to ensure consistency between the text and the figure.

      Yes. We have revised the wording to “eight”. Please see Line 200.

      (3) Figure 3A: Consider adding a diagram or legend that represents the colors associated with each DNA-binding domain (DBD) family.

      Thank you for your suggestion. The colors of DBD were aligned with the legend in Figure S3. We have added it in Figure 3A.

      Reviewer #2 (Recommendations for the authors):

      Line 21: The use of the abbreviation 'TF' should be done at the first instance of 'transcription factor'.

      Yes. We have revised it. Please see Line 21.

      Line 74: The purpose of this paragraph is slightly unclear. It is recommended that appropriate modifications are made.

      We are sorry for the confusion. The purpose of this paragraph was to introduce the major virulence pathways in P. aeruginosa and mention the important role of TRN in these pathways. We have modified it to make it clearer. Please see Lines 74-75.

      “P. aeruginosa employs diverse virulence pathways to establish successful infection, with QS being one of the major mechanisms involving the expression of many virulence genes.”

      Line 113: How were these 172 TFs selected?

      Thank you for indicating this question. In a previous study, we performed HT-SELEX to characterize the DNA-binding motifs of all TFs in P. aeruginosa PAO1, successfully identifying binding sequences for 182 TFs. To further elucidate the binding landscapes of the rest, we performed ChIP-seq on the remaining TFs (172 TFs in total with high-quality ChIP-seq libraries). Please see Lines 100-101 in the revised manuscript.

      Line 119: Defining other features, namely downstream and include Feature, would be helpful.

      Thank you for your suggestion. We have added the definition for all peak annotation in the legend. Please see Lines 569-574.

      “Annotation heatmap of all peak distribution with 6 locations: Upstream, where the peak is located entirely upstream of the gene; Downstream, where the peak is positioned completely downstream of the gene; Inside, where the peak is entirely contained within the gene body; OverlapStart, where the peak overlaps with the 5' end of the gene; OverlapEnd, where the peak overlaps with the 3' end of the gene; and IncludeFeature, where the peak completely encompasses the gene.”

      Line 129: The distribution type of AraC-type TFs is unclear - it is mentioned that AraC has a 'broad distribution', but it is later stated that it has a 'narrow distribution'.

      We are sorry for this mistake, and we have revised the example for “broad distribution”, which is Cor_CI instead of AraC. Please see Lines 132-135.

      Line 161: 'h value' here may need to be modified to 'absolute h value'.

      Yes. We have revised it. Please see Line 164.

      Line 502: "s The DNA" needs to be corrected.

      Yes. We have revised it. Please see Line 514.

      Line 515: It would be helpful to readers if the reference used for these pathways was cited.

      Yes. We have added the review reference (Shao et al, 2023) related to these pathways(6) (the 6th reference at the end of the rebuttal). Please see Line 527.

      Line 558: "Translation start site" needs to be corrected to "Transcription start site"

      The “TSS” here exactly indicated “Translation start site”.

      Line 593. "Virulent" pathways needs to be corrected to "virulence" pathways.

      Yes. We have revised it. Please see Line 609.

      Line 604: The type of categorization based on which the proportion of genes is displayed needs to be mentioned.

      Yes, we agree. We have added the type of categorization in the legend. Please see Lines 621-627.

      “Figure 6. Conservation and variability of TFs in PAO1. (A). The pie chart shows the proportions of genes categorized by their presence across P. aeruginosa strains for all genes. (B). The pie chart shows the distribution of TFs identified from PAO1 across different conservation categories. (C). The bar plot of the proportion for non-core TFs. Genes are categorized based on their presence frequency across P. aeruginosa strains: Core genes (present in 99% ~ 100% strains), Soft core genes (present in 95% ~ 99% strains), Shell genes (present in 15% ~ 95% strains), and Cloud genes (present in 0% ~ 15% strains).”

      Reference:

      (1) Liang H, Deng X, Li X, Ye Y, Wu M. 2014. Molecular mechanisms of master regulator VqsM mediating quorum-sensing and antibiotic resistance in Pseudomonas aeruginosa. Nucleic acids research 42:10307-10320.

      (2) Jones CJ, Ryder CR, Mann EE, Wozniak DJ. 2013. AmrZ modulates Pseudomonas aeruginosa biofilm architecture by directly repressing transcription of the psl operon. Journal of bacteriology 195:1637-1644.

      (3) Hickman JW, Harwood CS. 2008. Identification of FleQ from Pseudomonas aeruginosa as ac‐di‐GMP‐responsive transcription factor. Molecular microbiology 69:376-389.

      (4) Déziel E, Gopalan S, Tampakaki AP, Lépine F, Padfield KE, Saucier M, Xiao G, Rahme LG. 2005. The contribution of MvfR to Pseudomonas aeruginosa pathogenesis and quorum sensing circuitry regulation: multiple quorum sensing‐regulated genes are modulated without affecting lasRI, rhlRI or the production of N‐acyl‐L‐homoserine lactones. Molecular microbiology 55:998-1014.

      (5) Lizewski SE, Lundberg DS, Schurr MJ. 2002. The transcriptional regulator AlgR is essential for Pseudomonas aeruginosa pathogenesis. Infection and immunity 70:6083-6093.

      (6) Shao X, Yao C, Ding Y, Hu H, Qian G, He M, Deng X. 2023. The transcriptional regulators of virulence for Pseudomonas aeruginosa: Therapeutic opportunity and preventive potential of its clinical infections. Genes & Diseases 10:2049-2063.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      Previous studies in mammals and other vertebrates have shown that a noninvasive measure of cochlear tuning, based on the latency derived from stimulus-frequency otoacoustic emissions, provides a reasonable, and non-invasive, estimate of cochlear tuning. This valuable study confirms that finding in a new species, the budgerigar, and provides convincing support for the utility of otoacoustic estimates of cochlear tuning, a methodology previously explored primarily in mammals. The study's remaining claims of a mismatch between behavioral frequency selectivity and cochlear tuning are based on old behavioral data, and collected in an extreme frequency region at the edge of the limits of hearing. Hearing abilities are hard to measure accurately on the upper frequency edge of the hearing range, and the evidence for these claims is weak.

      We appreciate the detailed summary of our paper by the editors highlighting its strengths. As described in the following responses, we added additional evidence to the Introduction supporting that budgerigars have (1) unusual behavioral frequency tuning compared to other bird species and (2) unusual behavioral tuning results in budgerigars are not readily explainable by the audiogram. This additional background information, including Fig. 1B, substantially strengthens the claim of mismatched behavioral and neural/otoacoustic frequency tuning in budgerigars. Moreover, that the behavioral data are “old” seems not particularly relevant considering that the same behavioral methods are still widely used in animal research, as elaborated upon in the responses below. We suggest the term “previously published” to clarify the behavioral data used in our analyses.

      Reviewer #1 (Public review):

      Summary:

      In their manuscript, the authors provide compelling evidence that stimulus-frequency otoacoustic emission (SFOAE) phase-gradient delays predict the sharpness (quality factors) of auditory-nerve-fiber (ANF) frequency tuning curves in budgerigars. In contrast with mammals, neither SFOAE- nor ANF-based measures of cochlear tuning match the frequency dependence of behavioral tuning in this species of parakeet. Although the reason for the discrepant behavioral results (taken from previous studies) remains unexplained, the present data provide significant and important support for the utility of otoacoustic estimates of cochlear tuning, a methodology previously explored only in mammals.

      Strengths:

      * The OAE and ANF data appear solid and believable. (The behavioral data are taken from previous studies.)

      * No other study in birds (and only a single previous study in mammals) has combined behavioral, auditory-nerve, and otoacoustic estimates of cochlear tuning in a single species.

      * SFOAE-based estimates of cochlear tuning now avoid possible circularity and were are obtained by assuming that the tuning ratio estimated in chicken applies also to the budgerigar.

      Weaknesses:

      * In mammals, accurate prediction of neural Q_ERB from otoacoustic N_SFOAE involves the application of species-invariance of the tuning ratio combined with an attempt to compensate for possible species differences in the location of the so-called apical-basal transition (for a review, see Shera & Charaziak, Cochlear frequency tuning and otoacoustic emissions. Cold Spring Harb Perspect Med 2019; 9:pii a033498. doi: 10.1101/cshperspect.a033498; in particular, the text near Eq. 2 and the value of CFa|b).

      Despite this history, the manuscript makes no mention of the apical-basal transition, its possible role in birds, or why it was ignored in the present analysis. As but one result, the comparative discussion of the tuning ratio (paragraph beginning on lines 383) is incomplete and potentially misleading. Although the paragraph highlights differences in the tuning ratio across groups, perhaps these differences simply reflect differences in the value of CFa|b. For example, if the cochlea of the budgerigar is assumed to be entirely "apical" in character (so that CFa|b is around 7-8 kHz), then the budgerigar tuning ratios appear to align remarkably well with those previously obtained in mammals (see Shera et al 2010, Fig 9).

      We added sections on the apical-basal transition to the Results and Discussion, including how this concept might apply in budgerigars and other birds.

      * For the most part, the authors take previous behavioral results in budgerigar at face value, attributing the discrepant behavioral results to hypothesized "central specializations for the processing of masked signals". But before going down this easy road, the manuscript would be stronger if the authors discussed potential issues that might affect the reliability of the previous behavioral literature. For example, the ANF data show that thresholds rise rapidly above about 5 kHz. Might the apparent broadening of the behavioral filters arise as a consequence of off-frequency listening due to the need to increase signal levels at these frequencies? Or perhaps there are other issues. Inquiring readers would appreciate an informed discussion.

      This is a good point, also raised by reviewer 2, that declining audibility above 4 kHz could impact behavioral tuning estimates. On the other hand, other bird species with highly similar audiograms to budgerigars show conventional behavioral tuning that increases in sharpness relatively slowly and monotonically for higher frequences. Thus, the unusual pattern of behavioral tuning in budgerigars is not fully explainable by the audiogram. We added a section to the Introduction highlighting these points.

      Reviewer #2 (Public review):

      Summary:

      This manuscript describes two new sets of data involving budgerigar hearing: 1) auditory-nerve tuning curves (ANTCs), which are considered the 'gold standard' measure of cochlear tuning, and 2) stimulus-frequency otoacoustic emissions (SFOAEs), which are a more indirect measure (requiring some assumptions and transformations to infer cochlear tuning) but which are non-invasive, making them easier to obtain and suitable for use in all species, including humans. By using a tuning ratio (relating ANTC bandwidths and SFOAE delay) derived from another bird species (chicken), the authors show that the tuning estimates from the two methods are in reasonable agreement with each other over the range of hearing tested (280 Hz to 5.65 kHz for the ANTCs), and both show a slow monotonic increase in cochlear tuning quality over that range, as expected. These new results are then compared with (much) older existing behavioral estimates of frequency selectivity in the same species.

      Strengths:

      This topic is of interest, because there are some indications from the older behavioral literature that budgerigars have a region of best tuning, which the current authors refer to as an 'acoustic fovea', at around 4 kHz, but that beyond 5 kHz the tuning degrades. Earlier work has speculated that the source could be cochlear or higher (e.g., Okanoya and Dooling, 1987). The current study appears to rule out a cochlear source to this phenomenon.

      Weaknesses:

      The conclusions are rendered questionable by two major problems.

      The first problem is that the study does not provide new behavioral data, but instead relies on decades-old estimates that used techniques dating back to the 1970s, which have been found to be flawed in various ways. The behavioral techniques that have been developed more recently in the human psychophysical literature have avoided these well-documented confounds, such as nonlinear suppression effects (e.g., Houtgast, https://doi.org/10.1121/1.1913048; Shannon, https://doi.org/10.1121/1.381007; Moore, https://doi.org/10.1121/1.381752), perceptual confusion between pure-tone maskers and targets (e.g., Neff, https://doi.org/10.1121/1.393678), beats and distortion products produced by interactions between simultaneous maskers and targets (e.g., Patterson, https://doi.org/10.1121/1.380914), unjustified assumptions and empirical difficulties associated with critical band and critical ratio measures (Patterson, https://doi.org/10.1121/1.380914), and 'off-frequency listening' phenomena (O'Loughlin and Moore, https://doi.org/10.1121/1.385691). More recent studies, tailored to mimic to the extent possible the techniques used in ANTCs, have provided reasonably accurate estimates of cochlear tuning, as measured with ANTCs and SFOAEs (Shera et al., 2003, 2010; Sumner et al., 2010). No such measures yet exist in budgerigars, and this study does not provide any. So the study fails to provide valid behavioral data to support the claims made.

      We appreciate the reviewer’s efforts in summarizing and critiquing our study. We feel that the budgerigar data collected by the Dooling and Saunders labs remain essentially valid today. The methods used in these behavioral studies are rigorous and remain widely used in animal research (e.g., critical bands and ratios: Yost & Shofner, 2009; King et al., 2015; simultaneous masking: Burton et al., 2018). The methods are based on the same power-spectrum-model assumptions of auditory masking as even the most recent and elaborate human psychophysical procedures. We therefore believe that it remains highly relevant to test and report whether these methods can accurately predict cochlear tuning. More importantly, while forward-masking behavioral results are hypothesized to more accurately predict cochlear tuning humans (Shera et al., 2002; Joris et al., 2011; Sumner et al., 2018), evidence from nonhumans is controversial. For example, one study showed a closer match between forward-masking results and auditory-nerve tuning (ferret: Sumner et al., 2018), whereas several others showed a close match for simultaneous masking results (e.g., guinea pig, chinchilla, macaque; reviewed by Ruggero & Temchin, 2005; see Joris et al., 2011 for macaque auditory-nerve tuning). Moreover, forward- and simultaneous-masking results can often be equated with a simple scaling factor (e.g., Sumner et al., 2018). Given no consensus on an optimal behavioral method, and seemingly limited potential for the “wrong” method to fundamentally transform the shape of the behavioral tuning quality function, it seems reasonable to accept previously published behavioral tuning estimates as valid while also discussing limitations and remaining open to alternative interpretations. We added these points to the discussion and added clarification throughout as to the specific behavioral approaches used.

      The second, and more critical, problem can be observed by considering the frequencies at which the old behavioral data indicate a worsening of tuning. From the summary shown in the present Fig. 2, the conclusion that behavioral frequency selectivity worsens again at higher frequencies is based on four data points, all with probe frequencies between 5 and 6 kHz. Comparing this frequency range with the absolute thresholds shown in Fig. 3 (as well as from older budgerigar data) shows it to be on the steep upper edge of the hearing range. Thus, we are dealing not so much with a fovea as the point where hearing starts to end. The point that anomalous tuning measures are found at the edge of hearing in the budgerigar has been made before: Saunders et al. (1978) state in the last sentence of their paper that "the size of the CB rapidly increases above 4.0 kHz and this may be related to the fact that the behavioral audibility curve, above 4.0 kHz, loses sensitivity at the rate of 55 dB per octave."

      Hearing abilities are hard to measure accurately on the upper frequency edge of the hearing range, in humans as well as in other species. The few attempts to measure human frequency selectivity at that upper edge have resulted in quite messy data and unclear conclusions (e.g., Buus et al., 1986, https://doi.org/10.1007/978-1-4613-2247-4_37). Indeed, the only study to my knowledge to have systematically tested human frequency selectivity in the extended high frequency range (> 12 kHz) seems to suggest a substantial broadening, relative to the earlier estimates at lower frequencies, by as much as a factor of 2 in some individuals (Yasin and Plack, 2005; https://doi.org/10.1121/1.2035594) - in other words by a similar amount as suggested by the budgerigar data. The possible divergence of different measures at the extreme end of hearing could be due to any number of factors that are hard to control and calibrate, given the steep rate of threshold change, leading to uncontrolled off-frequency listening potential, the higher sound levels needed to exceed threshold, as well as contributions from middle-ear filtering. As a side note, in the original ANTC data presented in this study, there are actually very few tuning curves at or above 5 kHz, which are the ones critical to the argument being forwarded here. To my eye, all the estimates above 5 kHz in Fig. 3 fall below the trend line, potentially also in line with poorer selectivity going along with poorer sensitivity as hearing disappears beyond 6 kHz.

      This is an excellent point, also raised by reviewer 1, that declining audibility above 4 kHz could influence behavioral tuning measures. While we acknowledge this possibility, declining audibility cannot fully explain the unusual pattern of behavioral frequency tuning in budgerigars considering that other bird species with the same audiogram phenotype show conventional tuning patterns. We added these points to the Introduction and Fig. 1B. We also added clarification throughout that it is not just the shape of tuning function that is noteworthy in budgerigars, but also the extreme slope in the 1-3.5 kHz region. Behavioral tuning quality in budgerigars increases by 5.3 dB/octave in this range (i.e., nearly doubling each octave increase in frequency), vs. 1.8 dB/octave in humans, 2.5 dB/octave in ferret, 1.1 dB/octave in macaque, and 1.9 dB/octave in starling. This additional background information, including Fig. 1B, substantially strengthens the claim of mismatched behavioral and neural/otoacoustic frequency tuning in budgerigars.

      The basic question posed in the current study title and abstract seems a little convoluted (why would you expect a behavioral measure to reflect cochlear mechanics more accurately than a cochlear-based emissions measure?). A more intuitive (and likely more interesting) way of framing the question would be "What is the neural/mechanical source of a behaviorally observed acoustic fovea?" Unfortunately, this question does not lend itself to being answered in the budgerigar, as that 'fovea' turns out to be just the turning point at the end of the hearing range. There is probably a reason why no other study has referred to this as an acoustic fovea in the budgerigar.

      Overall, a safe interpretation of the data is that hearing starts to change (and becomes harder to measure) at the very upper frequency edge, and not just in budgerigars. Thus, it is difficult to draw any clear conclusions from the current work, other than that the relations between ANTC and SFOAEs estimates of tuning are consistent in budgerigar, as they are in most (all?) other species that have been tested so far.

      We removed the term fovea from the paper. See above for our argument that unusual behavioral tuning in budgerigars is not simply or fully explainable by the audiogram.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Line 34. As far as I could tell, no other study has referred to this region in budgerigar as an acoustic fovea. Probably for good reason (see above). This wording should probably be avoided.

      We removed the term.

      Line 35. Describing 3.5-4 kHz as 'mid-frequencies' is a stretch. 4 kHz is actually the corner frequency, above which hearing degrades.

      We added a more detailed and accurate description of the tuning pattern.

      Lines 89-91. This seems a nice statement of the problem, and to my mind makes for a much better rationale for the study.

      Line 255. "mixed effect" should "mixed effects".

      We made the correction.

      Line 380. Kuhn and Saunders didn't measure high enough to detect any changes in tuning.

      We removed the reference here.

    1. Author response:

      Public Reviews:  

      Reviewer #1 (Public review):

      Summary:

      This manuscript presents a study on expectation manipulation to induce placebo and nocebo effects in healthy participants. The study follows standard placebo experiment conventions with the use of TENS stimulation as the placebo manipulation. The authors were able to achieve their aims. A key finding is that placebo and nocebo effects were predicted by recent experience, which is a novel contribution to the literature. The findings provide insights into the differences between placebo and nocebo effects and the potential moderators of these effects.

      Specifically, the study aimed to:

      (1) assess the magnitude of placebo and nocebo effects immediately after induction through verbal instructions and conditioning

      (2) examine the persistence of these effects one week later, and

      (3) identify predictors of sustained placebo and nocebo responses over time.

      Strengths:

      An innovation was to use sham TENS stimulation as the expectation manipulation. This expectation manipulation was reinforced not only by the change in pain stimulus intensity, but also by delivery of non-painful electrical stimulation, labelled as TENS stimulation.

      Questionnaire-based treatment expectation ratings were collected before conditioning and after conditioning, and after the test session, which provided an explicit measure of participants' expectations about the manipulation.

      The finding that placebo and nocebo effects are influenced by recent experience provides a novel insight into a potential moderator of individual placebo effects.

      We thank the reviewer for their thorough evaluation of our manuscript and for highlighting the novelty and originality of our study.

      Weaknesses:

      There are a limited number of trials per test condition (10), which means that the trajectory of responses to the manipulation may not be adequately explored.

      We appreciate the reviewer’s comment regarding the number of trials in the test phase (i.e., 10 trials per condition). This trial number was chosen to ensure comparability with previous studies employing similar designs and research questions (e.g. Colloca et al., 2010). Our primary objective was to directly compare placebo and nocebo effects within a within-subject design and to examine their persistence one week after the first test session. While we did not specifically aim to investigate the trajectory of responses within a single testing session, we fully agree that a comprehensive analysis of the trajectories of expectation effects on pain would be a valuable extension of our work. We will acknowledge this limitation and future direction in the revised manuscript. 

      On day 8, one stimulus per stimulation intensity (i.e., VAS 40, 60, and 80) was applied before the start of the test session to re-familiarise participants with the thermal stimulation. There is a potential risk of revealing the manipulation to participants during the re-familiarization process, as they were not previously briefed to expect the painful stimulus intensity to vary without the application of sham TENS stimulation.

      We thank the reviewer for the opportunity to clarify that participants were informed at the beginning of the experiment that we would use different stimulation intensities to re-familiarize them with the stimuli before the second test session. We are therefore confident that participants perceived this step as part of a recalibration rather than associating it with the experimental manipulation. We will add this information to the revised version of the manuscript. 

      The differences between the nocebo and control conditions in pain ratings during conditioning could be explained by the differing physiological effects of the different stimulus intensities, so it is difficult to make any claims about expectation effects here.

      We appreciate the reviewer’s comment and agree that, despite the careful calibration of the three pain stimuli, we cannot entirely rule out the possibility that temporal dynamics during the conditioning session were influenced by differential physiological effects of the varying stimulus intensities (e.g., intensity-dependent habituation or sensitization). We will address this in the revision of the manuscript, but we would like to emphasize that the stronger nocebo effects during the test phase are statistically controlled for any differences in the conditioning session. 

      A randomisation error meant that 25 participants received an unbalanced number of 448 trials per condition (i.e., 10 x VAS 40, 14 x VAS 60, 12 x VAS 80).

      We agree that it is unfortunate that 25 participants were conditioned with an unbalanced number of trials per condition during the conditioning session. In the revised version of the manuscript, we will include additional analyses to demonstrate that this imbalance did not systematically bias the results and that the findings observed during the test phase remain robust despite this error.  

      Reviewer #2 (Public review):

      Summary:

      Kunkel et al aim to answer a fundamental question: Do placebo and nocebo effects differ in magnitude or longevity? To address this question, they used a powerful within-participants design, with a very large sample size (n=104), in which they compared placebo and nocebo effects - within the same individuals - across verbal expectations, conditioning, testing phase, and a 1-week follow-up. With elegant analyses, they establish that different mechanisms underlie the learning of placebo vs nocebo effects, with the latter being acquired faster and extinguished slower. This is an important finding for both the basic understanding of learning mechanisms in humans and for potential clinical applications to improve human health.

      Strengths:

      Beyond the above - the paper is well-written and very clear. It lays out nicely the need for the current investigation and what implications it holds. The design is elegant, and the analyses are rich, thoughtful, and interesting. The sample size is large which is highly appreciated, considering the longitudinal, in-lab study design. The question is super important and well-investigated, and the entire manuscript is very thoughtful with analyses closely examining the underlying mechanisms of placebo versus nocebo effects.

      We thank the reviewer for their positive evaluation of our manuscript and for acknowledging the large sample size, methodological rigor, and the significant implications for clinical applications and the broader research field.

      Weaknesses:

      There were two highly addressable weaknesses in my opinion:

      (1) I could not find the preregistration - this is crucial to verify what analyses the authors have committed to prior to writing the manuscript. Please provide a link leading directly to the preregistration - searching for the specified number in the suggested website yielded no results.

      We apologize that the registration number alone does not directly lead to the preregistration of this study. We thank the reviewer for pointing this out and will include a link to the preregistration in the revised manuscript. This study was pre-registered with the German Clinical Trial Register (registration number: DRKS00029228; https://drks.de/search/de/trial/DRKS00029228).

      (2) There is a recurring issue which is easy to address: because the Methods are located after the Results, many of the constructs used, analyses conducted, and even the main placebo and nocebo inductions are unclear, making it hard to appreciate the results in full. I recommend finding a way to detail at the beginning of the results section how placebo and nocebo effects have been induced. While my background means I am familiar with these methods, other readers will lack that knowledge. Even a short paragraph or a figure (like Figure 4) could help clarify the results substantially. For example, a significant portion of the results is devoted to the conditioning part of the experiment, while it is unknown which part was involved (e.g., were temperatures lowered/increased in all trials or only in the beginning).

      We thank the reviewer for this comment and suggestion. In the revised version, we will restructure the manuscript and include more detailed information about the key experimental procedures and design at the beginning of the Results section to enhance clarity and improve the interpretability of the reported findings.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Previous studies have shown that the MSH6 family of mismatch repair proteins contains an unstructured N-terminal domain that contains either a PWWP domain, a Tudor domain or neither and that the interaction of the histone reader domains with the appropriate histone H3 modification enhances mismatch repair, and hence reduces mutation rates in coding regions to some extent. However, the elimination of the MSH6-histone modification probably does not completely eliminate mismatch repair, although the published papers on this point do not seem definitive.

      In this study, the authors perform a details phylogenetic analysis of the presence of the PWWP and Tudor domains in MSH6 proteins across the tree of life. They observe that there are basically three classes of organisms that contain either a PWWP domain, a Tudor domain, or neither. On the basis of their analysis, they suggest that this represents convergent evolution of the independent acquisition of histone reader domains and that key amino acid residues in the reader domains are selected for.

      Strengths:

      The phylogenetic aspects of the work seem well done and the basic evolutionary conclusions of the work are well supported. The basic evolutionary conclusions are interesting and there is little to criticize from my perspective.

      Thank you for the positive evaluation. We appreciate your interest and review.

      Weaknesses:

      A major concern about this paper is that the authors fail to put their work into the proper context of what is already known about the N-terminus of MSH6. Further, their structural studies, which are really structural illustrations, are misleading, often incorrect, and not always helpful in addition to having been published before.

      Thank you for the helpful suggestions on this front. We agree that some of the structural visualizations were over simplified and apologize for the lack of clarity. Notably, we did not annotate the presence of putative or known short PCNA-interacting protein (PIP) motifs which have been found at the linker disordered N-terminus of MSH6 proteins. Indeed, while not direct to our investigation of the origins of histone readers, the PIP motifs are an interesting and functionally important feature of MSH6 structural biology, especially because they may facilitate DNA repair processes more generally. In the revised manuscript, we aim to improve the scholarship on this topic and clarify the presence/importance of this motif for MSH6 function, as well as what is known about the structural biology of the MSH6 N-terminus more broadly. We will add annotations of the PIP motif and will also improve structural prediction by visualizing MSH6 structure in its dimerized form with MSH2, for a more accurate estimate of its folding in vivo. We hope that these in addition to other valuable suggested improvements will enhance the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      In this work, Monroe JG and colleagues show a compelling case of convergent evolution in the fusion between an important mismatch repair protein (MSH6) and histone reader domains across the tree of life. These fused MSH6 readers have been shown to be important for the recruitment of MSH6 to exon-rich genome locations, therefore improving the efficiency of reducing mutation rates in coding regions.

      Comparative genomic analyses here performed revealed independent instances of MSH6 fusion with histone readers in plants and metazoa with several instances of putative loss (or gain) across the phylogeny. The work also unveiled instances of MSH6 fusion putatively interesting domains in fungi which might be worth exploring in the future.

      The authors also show potential signatures of purifying selection in functional amino acids MSH6 histone readers.

      Overall the approach is adequate for the questions proposed to be answered, the analyses are rigorous and support the authors' claims.

      DNA repair genes are essential to maintain genome stability and fidelity, and alterations in these pathways have been associated with hypermutation phenotypes in the context for instance of cancer in humans, with sometimes implications in treatment resistance. This is an important work that contributes to our understanding of the evolutionary consequences of the evolution of epigenome-targeted DNA repair.

      Strengths:

      The methods used are adequate for the questions and support the results. The search for MSH6 fusions was rigorous and conservative, which strengthens the significance of the claims on the evolutionary history of these fusion events.

      Thank you for the positive evaluation. We appreciate your interest and review.

      Weaknesses:

      I did not identify any major weaknesses, but please see my suggestions/recommendations.

      Thank you, we will also address your suggestions, which provide valuable recommendations for improving the revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      In the manuscript entitled "Convergent evolution of epigenome recruited DNA repair across the Tree of Life", Monroe et al. investigate bioinformatically how some important mechanisms of epigenome-targeted DNA repair evolved at the tree of life scale. They provide a clear example of convergent evolution of these mechanisms between animals and plants, investigating more than 4000 eukaryotic genomes, and uncovering a significant association between gain/retention of such mechanisms with genome size and high intron content, that at least partially explains the evolutionary patterns observed within major eukaryotic lineages.

      Strengths:

      The manuscript is well written, clear, and understandable, and has potentially broad interest. It provides a thorough analysis of the evolution of MSH6-related DNA repair mechanisms using more than 4000 eukaryotic genomes, a pretty impressive number allowing to identify both large-scale (i.e. kingdoms) as well as shorter-scale (i.e. phyla, orders) evolutionary patterns. Moreover, despite providing no experimental validation, it investigates with a sufficient degree of depth, a potential relationship between gain/retention of epigenome recruited DNA repair mediated by MSH6 and genomic, as well as life-history (population size, body mass, lifespan), traits. In particular, it provides convincing evidence for a causative effect between genome size/intron content and the presence/absence of this mechanism. Moreover, it stimulates further scientific investigation and biological questions to be addressed, such as the conservation of epigenomes across the tree of life, the existence of potential trade-offs in gain/retention vs. loss of such mechanisms, and the relationship between these processes, mutation rate heterogeneity, and evolvability.

      Thank you for the positive evaluation. We appreciate your interest and review.

      Weaknesses:

      Despite the interesting and necessary insights provided on (1) the evolution of DNA repair mechanisms, and (2) the convergent evolution of molecular mechanisms, this bioinformatic study emanates from studies in humans and Arabidopsis already showing signs of potential convergent evolution in aspects of epigenome-recruited DNA repair. For this, this study, although bioinformatically remarkably thorough, does not come as a surprise, potentially lowering its novelty.

      What could have increased further its impact, interest, and novelty could have been a more comprehensive understanding of the causative processes leading to gain/retention vs. loss of MSH6-related epigenetic recruitment mechanisms. The authors provide interesting associations with life-history traits (yet not significant), and significant links with genome size and intron content only at the theoretical level. For the first aspect, the analyses could have expanded toward other life-history traits. For the second, maybe it could have been even possible to tackle experimentally some of the generated questions, functionally in some models, or deepened using specific case studies.

      We agree that this work expands on recent experimental work in humans and Arabidopsis on the function of histone readers in MSH6, PWWP and Tudor, respectively. However, the evolution of these fusions remained a significant knowledge gap, limiting the degree to which functional work could be translated to other organisms. This study definitively characterized the evolutionary history of MHS6 histone readers and lays the groundwork for future investigations in diverse species. We agree that more causal inference would be valuable to understand the evolutionary pressures acting on MSH6 histone reader presence/absence. Indeed, we prioritized the conservative approach of testing hypotheses with strict phylogenetically constrained contrasts. While we observed highly significant associations between histone readers and genomic traits like intron content, associations with life history traits were only significant before accounting for phylogeny. It is possible that this is due to a lack of power because such traits are only available in limited taxa. In the revised manuscript, we aim to clarify potential causes, outline future experimental work beyond the scope of this individual study, and argue that this work highlights the need to catalog trait diversity at broader phylogenetic scales.  We also address other valuable suggestions in the revised manuscript.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Is peristimulus alpha (8-14 Hz) frequency and/or phase involved in shaping the length of visual and audiovisual temporal binding windows, as posited by the discrete sampling hypothesis? If so, to what extent and perceptual scenario are they functionally relevant? The authors addressed such questions by collecting EEG data during the completion of the widely-known 2-flash fusion paradigm, administered both in a standard (i.e., visual only, F2) and audiovisual (i.e., 2 flashes and 1 beep, F2B1) fashion. Instantaneous frequency estimation performed over parieto-occipital sensors revealed slower alpha rhythms right after stimulus onset in the F2B1 condition, as compared to the F2, a pattern found to correlate with the difference between modality-specific ISIs (F2B1-F2). Of note, peristimulus alpha frequency differed also between 1 vs 2 flashes reports, although in the visual modality only (i.e., faster alpha oscillations in 2 flash percept vs 1 flash). This pattern of results was reinvigorated in a causal manner via occipital tACS, which was capable of, respectively, narrowing down vs enlarging the temporal binding window of individuals undergoing 13 Hz vs 8 Hz stimulation in the F2 modality alone. To elucidate what the oscillatory signatures of crossmodal integration might be, the authors further focused on the phase of posterior alpha rhythms. Accordingly, the Phase Opposition Sum proved to significantly differ between modalities (F2B1 vs F2) during the prestimulus time window, suggesting that audiovisual signals undergo finer processing based on the ongoing phase of occipital alpha oscillations, rather than the speed at which these rhythms cycle. As a last bit of information, a computational model factoring in the electrophysiological assumptions of both the discrete sampling hypothesis and auditory-induced phase-resetting was devised. Analyses run on such synthetic data were partially able to reproduce the patterns witnessed in the empirical dataset. While faster frequency rates broadly provide a higher probability to detect 2 flashes instead of 1, the occurrence of a concurrent auditory signal in cross-modal trials should cause a transient elongation (i.e. slower frequency rate) of the ongoing alpha cycle due to phase-reset dynamics (as revealed via inter-trial phase clustering), prompting larger ISIs during F2B1 trials. Conversely, the model provides that alpha oscillatory phase might predict how well an observer dissociates sensory information from noise (i.e., perceptual clarity), with the second flash clearly perceived as such as long as it falls within specific phase windows along the alpha cycle.

      Strengths:

      The authors leveraged complementary approaches (EEG, tACS, and computational modelling), the results thereof not only integrate, but depict an overarching mechanistic scenario elegantly framing phase-resetting dynamics into the broader theoretical architecture posited by the discrete sampling hypothesis. Analyses on brain oscillations (either via frequency sliding and phase opposition sum) mostly appear to be methodologically sound, and very-well supported by tACS results. Under this perspective, the modelling approach serves as a convenient tool to reconcile and shed more light on the pieces of evidence gathered on empirical data, returning an appealing account on how cross-modal stimuli interplay with ongoing alpha rhythms and differentially affect multisensory processing in humans.

      Weaknesses:

      Some information relative to the task and the analyses is missing. For instance, it is not entirely clear from the text what the number of flashes actually displayed in explicit short trials is (1 or 2?). We believe it is always two, but it should be explicitly stated.

      We thank the reviewer for highlighting this important point. In our study, all explicit trials consistently presented two flashes. We will clearly state this detail in the Methods section to avoid any further confusion.

      Moreover, the sample size might be an issue. As highlighted by a recent meta-analysis on the matter (Samaha & Romei, 2024), an underpowered sample size may very well drive null-findings relative to tACS data in F2B1 trials, in interplay with broad and un-individualized frequency targets.

      We thank the reviewer for raising this point. First, we would like to clarify that our results do not suggest that the frequency effect is absent in the F2B1 condition; rather, it is relatively attenuated compared to the F2 condition. If the sample size were the primary issue, we would expect to observe a null effect in both conditions. Instead, the stronger frequency modulation in F2 confirms that the sound-induced modulation is present, albeit reduced in the audiovisual context. In our revised manuscript, we will explicitly note that our claim is not that there is no frequency effect in F2B1 but that the effect is weaker relative to F2, and we will also acknowledge the potential limitations associated with sample size and the lack of individualized frequency targeting.

      Some criticality arises regarding the actual "bistability" of bistable trials, as the statistics relative to the main task (i.e., the actual means and SEMs are missing) broadly point toward a higher proclivity to report 2 instead of 1 flash in both F2B1 and F2 trials. This makes sense to some extent, given that 2 flashes have always been displayed (at least in bistable trials), yet tells about something botched during the pretest titration procedure.

      We thank the reviewer for pointing out the potential bias toward reporting “two flashes” in the bistable trials. Because our experimental design involves presenting two flashes in both explicit and bistable trials, a slight tendency to report two flashes may naturally arise, especially at threshold levels determined during pretesting. We believe, however, that this bias does not undermine our primary findings. Our psychophysical procedure is designed to align the inter-stimulus interval with each participant’s fusion threshold, aiming for a near 50/50 split between “one-flash” and “two-flash” reports. However, given that two flashes are always presented, participants may be predisposed to report two flashes when uncertain. This reflects a plausible perceptual bias inherent in the bistable design, rather than a systematic flaw. Importantly, this tendency appears at comparable levels in both the F2 and F2B1 conditions, indicating that it does not selectively affect any particular condition. In the revised manuscript, we will include additional descriptive statistics, such as means and standard deviations, to demonstrate that the observed bias remains within an acceptable range and does not compromise our core conclusions regarding the modulatory effect of auditory input on visual integration.

      Coming to the analyses on brain waves, one main concern relates to the phase-reset-induced slow-down of posterior alpha rhythms being of true oscillatory nature, rather than a mere evoked response (i.e., not sustained over time).

      We appreciate the reviewer’s concern regarding this issue. First, the sustained decrease in posterior alpha frequency observed in our study—persisting for approximately 280 ms—substantially exceeds the typical duration of an auditory evoked potential (generally 50–200 ms) (Näätänen and Picton, 1987). This extended period of modulation suggests that it is not merely a transient evoked response.

      Second, our analysis of alpha power further supports this interpretation. A purely evoked response is usually accompanied by a corresponding increase in signal power; however, our results show no such power increase when comparing the F2B1 condition with the F2 condition.

      Moreover, the observed increase in alpha phase resetting—as measured by inter-trial phase coherence (ITC)—does not significantly correlate with changes in alpha power. This dissociation further indicates that the auditory-induced effects are unlikely to be driven solely by evoked potentials, but are more consistent with a reorganization of the intrinsic neural oscillatory activity.

      Together, these lines of evidence strongly support the view that the auditory-induced decrease in alpha frequency reflects true changes in ongoing oscillatory dynamics, rather than being merely a transient evoked response.

      Another question calling for some further scrutiny regards the overlooked pattern linking the temporal extent of the IAF differences between F2 and F2B1 trials with the ISIs across experimental conditions (explicit short, bistable, and explicit long). That is, the wider the ISI, the longer the temporal extent of the IAF difference between sensory modalities. Although neglected by the authors, such a trend speaks in favour of a rather nuanced scenario stemming from not only auditory-induced phase-reset alpha cycle elongation, but also some non-linear and perhaps super-additive contribution of flash-induced phase-resetting. This consideration introduces some of the issues about the computational simulation, which was modelled around the assumption of phase-resetting being triggered by acoustic stimuli alone. Given how appealing the model already is, I wonder whether the authors might refine the model accordingly and integrate the phase-resetting impact of visual stimuli upon synthetic alpha rhythms.

      We appreciate the reviewer’s insightful comment regarding the potential influence of flash-induced phase resetting on the temporal extent of the IAF differences. We acknowledge that the observation—that wider ISIs are associated with a longer period of IAF differences—hints at a non-linear or even super-additive interaction between auditory- and flash-induced phase resetting mechanisms.

      However, the primary focus of our current study is on how auditory stimuli affect alpha oscillatory dynamics. Our experimental design and computational model were specifically optimized to capture auditory-induced phase resetting. Incorporating the additional influence of flash-induced effects would require a significantly more refined experimental framework and a more complex modeling approach. This added complexity could obscure the interpretation of our main findings, which are centered on auditory influences.

      In the revised manuscript, we will address this intriguing possibility in the Discussion section. We will acknowledge that while the data hint at a potential visual contribution, our present model deliberately isolates auditory-induced phase resetting to maintain clarity. We also propose that future research, with more precise experimental designs and enhanced modeling techniques, is necessary to fully disentangle and capture the interplay between auditory and flash-induced phase resetting mechanisms.

      Relatedly, I would also suggest the authors to throw in a few more simulations to explore the parameter space and assay, to which quantitative extent the model still holds (e.g. allowing alpha frequency to randomly change within a range between 8 and 13 Hz, or pivoting the phase delay around 10 or 50 ms).

      We appreciate the reviewer’s suggestion to further explore our model’s parameter space. In response, we will conduct additional simulations that incorporate variability in alpha frequency—sampling randomly between 8 and 13 Hz—and examine alternative phase delays (e.g., around 10 and 50 ms). By systematically adjusting these parameters, we can more thoroughly evaluate the model’s robustness and delineate its boundaries under a broader range of neurophysiological conditions. We will present these results in the revised manuscript and discuss how they inform our understanding of alpha-driven visual integration in cross-modal contexts.

      As a last remark, I would avoid, or at least tone down, concluding that the results hereby presented might reconcile and/or explain the null effects in Buergers & Noppeney, 2022; as the relationship between IAFs and audiovisual abilities still holds when examining other cross-modal paradigms such as the Sound-Induced Flash-Illusion (Noguchi, 2022), and the aforementioned patterns might be due to other factors, such as a too small sample size (Samaha & Romei, 2024).

      We appreciate the reviewer’s suggestion and will revise our claims accordingly. In the revised manuscript, we will clarify that while our study demonstrates a mechanism by which alpha oscillations influence audiovisual integration in certain paradigms, this does not mean that our findings fully reconcile all conflicting results in the literature. We will emphasize that our mechanism may help explain why alpha frequency plays a critical role in some experimental settings, but that factors such as sample size, task parameters, and experimental design differences likely contribute to the divergent results observed across studies. Accordingly, we acknowledge that further research with larger samples and more refined methodologies is necessary to fully reconcile these discrepancies. This more cautious interpretation will be clearly discussed in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The authors used a visual flash discrimination task in which two flashes are presented one after another with different inter-stimulus intervals. Participants either perceive one flash or two flashes. The authors show that the simultaneous presence of an auditory input extends the temporal window of integration, meaning that two flashes presented shortly after one another are more likely to be perceived as a single flash. Auditory inputs are accompanied by a reduction in alpha frequency over visual areas. Prestimulus alpha frequency predicts perceptual outcomes in the absence of auditory stimuli, whereas prestimulus alpha phase becomes the dominant predictor when auditory input is present. A computational model based on phase-resetting theory supports these findings. Additionally, a transcranial stimulation experiment confirms the causal role of alpha frequency in unimodal visual perception but not in cross-modal contexts.

      Strengths:

      The authors elegantly combined several approaches-from behavior to computational modeling and EEG-to provide a comprehensive overview of the mechanisms involved in visual integration in the presence or absence of auditory input. The methods used are state-of-the-art, and the authors attempted to address possible pitfalls.

      Weaknesses:

      The use of Bayesian statistics could further strengthen the paper, especially given that a few p-values are close to the significance threshold (lines 162 & 258), but they are interpreted differently in different cases (absence of effect vs. trend).

      We appreciate the reviewer’s suggestion regarding the use of Bayesian statistics. We agree that a Bayesian framework can offer valuable complementary insights to our analysis by helping to distinguish whether a marginal p-value represents a trend or truly indicates the absence of an effect. To enhance the robustness of our conclusions, we will incorporate supplemental Bayesian analyses in the revised manuscript.

      Overall, these results provide new insights into the role of alpha oscillations in visual processing and offer an interesting perspective on the current debate regarding the roles of alpha phase and frequency in visual perception. More generally, they contribute to our understanding of the neural dynamics of multisensory integration.

      Reviewer #3 (Public review):

      Summary:

      The authors investigated the impact of an auditory stimulus on visual integration at the behavioral, electrophysiological, and mechanistic levels. Although the role of alpha brain oscillations on visual perception has been widely studied, how the brain dynamics in the visual cortices are influenced by a cross-modal stimulus remains ill-defined. The authors demonstrated that auditory stimulation systematically induced a drop in visual alpha frequency, increasing the time window for audio-visual integration, while in the unimodal condition, visual integration was modulated by small variations within the alpha frequency range. In addition, they only found a role of the phase of alpha brain oscillations on visual perception in the cross-modal condition. Based on the perceptual cycles' theory framework, the authors developed a model allowing them to describe their results according to a phase resetting induced by the auditory stimulation. These results showed that the influence of well-known brain dynamics on one modality can be disrupted by another modality. They provided insights into the importance of investigating cross-modal brain dynamics, and an interesting model that extends the perceptual cycle framework.

      Strengths:

      The results are supported by a combination of various, established experimental and analysis approaches (e.g., two-flash fusion task, psychometric curves, phase opposition), ensuring strong methodological bases and allowing direct comparisons with related findings in the literature.

      The model the authors proposed is an extension and an improvement of the perceptual cycle's framework. Interestingly, this model could then be tested in other experimental approaches.

      Weaknesses:

      There is an increasing number of studies in cognitive neuroscience showing the importance of considering inter-individual variability. The individual alpha frequency (IAF) varied from 8 to 13 Hz with a huge variability across participants, and studies have shown that the IAF influenced visual perception. Investigating inter-individual variations of the IAF in the reported results would be of great interest, especially for the model.

      We appreciate the reviewer’s valuable feedback regarding the importance of inter-individual variability in alpha frequency. In our current study, we have already addressed participant-level variability in our neural data by performing inter-subject correlation analyses, investigating whether individual reductions in alpha frequency correlate with broader temporal integration windows at the behavioral level.

      Moreover, our computational model incorporates physiologically realistic distributions for key parameters, including frequency and amplitude, which captures some degree of individual variability. Nevertheless, we acknowledge that a more targeted examination of how different IAF values specifically affect the model’s predictions would be highly valuable. In response, we will expand our simulations to systematically explore a range of IAF values and assess their impact on temporal integration windows and related measures of audiovisual processing. These additional analyses will help clarify the role of inter-individual variability in alpha frequency and further strengthen the mechanistic account offered by our model. We will detail these enhancements and discuss their implications in the revised manuscript.

      Although the use of non-invasive brain stimulation to infer causality is a method of great interest, the use of tACS in the presented work is not optimal. Instead of inducing alpha brain oscillations in visual cortices, the use of tACS to activate the auditory cortex instead of the actual auditory stimulation would have presented more interest.

      We appreciate the reviewer’s suggestion and acknowledge that non-invasive brain stimulation offers promising avenues for inferring causality. In our study, our primary hypothesis focused on the role of occipital alpha oscillations in defining the temporal window for visual integration, and accordingly we targeted visual cortex in our tACS protocol.

      We recognize that stimulating the auditory cortex could provide additional insights into auditory contributions to phase resetting. However, accurately targeting the auditory cortex with tACS presents technical challenges. The auditory cortex is located deeper within the temporal lobe, and factors such as variable skull thickness and complex current spread make it difficult to reliably modulate its neural activity compared to the more superficial visual areas. Indeed, recent studies have demonstrated that tACS-induced electric fields in the temporal regions tend to be weaker and less focal—for example, Huang et al. (2017) and Opitz et al. (2016) highlight the limitations in achieving robust stimulation of deeper or anatomically complex brain regions using conventional tACS approaches.

      Given these considerations, while we agree that future investigations could benefit from exploring auditory cortex stimulation—either as an alternative or as a complementary approach—the present study remains focused on visual alpha modulation, where our protocol is well validated and yields reliable results. In the revised manuscript, we will clearly discuss these issues and acknowledge the potential, yet technically challenging, possibility of stimulating the auditory cortex in future work to further disentangle the contributions of auditory and visual inputs to cross-modal integration.

    1. Author response:

      Reviewer 1 (Public Review):

      “Summary:

      In this paper, the authors aimed to test the ability of bumblebees to use bird-view and ground-view for homing in cluttered landscapes. Using modelling and behavioural experiments, the authors showed that bumblebees rely most on ground-views for homing.

      Strengths:

      The behavioural experiments are well-designed, and the statistical analyses are appropriate for the data presented.

      Weaknesses:

      Views of animals are from a rather small catchment area.

      Missing a discussion on why image difference functions were sufficient to explain homing in wasps (Murray and Zeil 2017).

      The artificial habitat is not really 'cluttered' since landmarks are quite uniform, making it difficult to infer ecological relevance.”

      Thank you for your thorough evaluation of our study. We aimed to investigate local homing behaviour on a small scale, which is ecologically relevant given that the entrance of bumblebee nests is often inconspicuously hidden within the vegetation. This requires bees to locate their nest entrance using views within a confined area. While many studies have focused on larger scales using radar tracking (e.g. Capaldi et al. 2000; Osborne et al. 2013; Woodgate et al. 2016), there is limited understanding of the mechanisms behind local homing on a smaller scale, especially in dense environments.

      We appreciate your suggestion to include the study by Murray and Zeil (2017) in our discussion. Their research explored the catchment areas of image difference functions on a larger spatial scale with a cubic volume of 5m x 5m x 5m. Aligned with their results, we found that image difference functions pointed towards the location of the objects surrounding the nest when the images were taken above the objects. However, within the clutter, i.e. the dense set of objects surrounding the nest, the model did not perform well in pinpointing the nest position.

      We agree with your comment about the term "clutter". Therefore, we will refer to our landmark arrangement as a "dense environment" instead. Uniformly distributed objects do indeed occur in nature, as seen in grasslands, flower meadows, or forests populated with similar plants.

      Reviewer 2 (Public Review):

      Summary:

      In a 1.5m diameter, 0.8m high circular arena bumblebees were accustomed to exiting the entrance to their nest on the floor surrounded by an array of identical cylindrical landmarks and to forage in an adjacent compartment which they could reach through an exit tube in the arena wall at a height of 28cm. The movements of one group of bees were restricted to a height of 30cm, the height of the landmark array, while the other group was able to move up to heights of 80cm, thus being able to see the landmark array from above.

      During one series of tests, the flights of bees returning from the foraging compartment were recorded as they tried to reach the nest entrance on the floor of the arena with the landmark array shifted to various positions away from the true nest entrance location. The results of these tests showed that the bees searched for the net entrance in the location that was defined by the landmark array.

      In a second series of tests, access to the landmark array was prevented from the side, but not from the top, by a transparent screen surrounding the landmark array. These tests showed that the bees of both groups rarely entered the array from above, but kept trying to enter it from the side.

      The authors express surprise at this result because modelling the navigational information supplied by panoramic snapshots in this arena had indicated that the most robust information about the location of the nest entrance within the landmark array was supplied by views of the array from above, leading to the following strong conclusions:

      line 51: "Snapshot models perform best with bird's eye views"; line 188: "Overall, our model analysis could show that snapshot models are not able to find home with views within a cluttered environment but only with views from above it."; line 231: "Our study underscores the limitations inherent in snapshot models, revealing their inability to provide precise positional estimates within densely cluttered environments, especially when compared to the navigational abilities of bees using frog's-eye views." Strengths:

      The experimental set-up allows for the recording of flight behaviour in bees, in great spatial and temporal detail. In principle, it also allows for the reconstruction of the visual information available to the bees throughout the arena.

      The experimental set-up allows for the recording of flight behaviour in bees, in great spatial and temporal detail. In principle, it also allows for the reconstruction of the visual information available to the bees throughout the arena.

      Weaknesses:

      Modelling:

      Modelling left out information potentially available to the bees from the arena wall and in particular from the top edge of the arena and cues such as cameras outside the arena. For instance, modelled IDF gradients within the landmark array degrade so rapidly in this environment, because distant visual features, which are available to bees, are lacking in the modelling. Modelling furthermore did not consider catchment volumes, but only horizontal slices through these volumes.

      When we started modelling the bees’ homing based on image-matching, we included the arena wall. However, the model simulations pointed only coarsely towards the clutter but not toward the nest position. We hypothesised that the arena wall and object location created ambiguity. Doussot et al. (2020) showed that such a model can yield two different homing locations when distant and local cues are independently moved. Therefore, we reduced the complexity of the environment by concentrating on the visual features, which were moved between training and testing. (Neither the camera nor the wall were moved between training and test). We acknowledge that this information should have been provided to substantiate our reasoning. As such, we will include model results with the arena wall in the revised paper.

      As we wanted to investigate if bees would use ground views or bird’s eye views to home in a dense environment, we think the catchment volumes would provide qualitatively similar, though quantitatively more detailed information as catchment slices. Our approach of catchment slices is sufficient to predict whether ground or bird' s-eye views perform better in leading to the nest, and we will, therefore, not include further computations of catchment volumes.

      Behavioural analysis:

      The full potential of the set-up was not used to understand how the bees' navigation behaviour develops over time in this arena and what opportunities the bees have had to learn the location of the nest entrance during repeated learning flights and return flights.

      Without a detailed analysis of the bees' behaviour during 'training', including learning flights and return flights, it is very hard to follow the authors' conclusions. The behaviour that is observed in the tests may be the result of the bees' extended experience shuttling between the nest and the entry to the foraging arena at 28cm height in the arena wall. For instance, it would have been important to see the return flights of bees following the learning flights shown in Figure 17.

      Basically, both groups of bees (constrained to fly below the height of landmarks (F) or throughout the height of the arena (B)) had ample opportunities to learn that the nest entrance lies on the floor of the landmark array. The only reason why B-bees may not have entered the array from above when access from the side was prevented, may simply be that bumblebees, because they bumble, find it hard to perform a hovering descent into the array.

      A prerequisite for studying the learning flight in a given environment is showing that the bees manage to return to their home. Here, our primary goal was to demonstrate this within a dense environment. While we understand that a detailed analysis of the learning and return flights would be valuable, we feel this is outside the scope of this particular study.

      Multi-snapshot models have been repeatedly shown to be sufficient to explain the homing behaviour in natural as well as artificial environments. A model can not only be used to replicate but also to predict a given outcome and shape the design of experiments. Here, we used the models to shape the experimental design, as it does not require the entire history of the bee's trajectory to be tested and provides interesting insight into homing in diverse environments.

      Our current knowledge of learning flights did not permit these investigations of bee training. Firstly, our setup does not allow us to record each inbound and outbound flight of the bumblebees during training. Doing so would require blocking the entire colony for extended time periods, potentially impairing the motivation of the bees to forage or the survival and development of the colony. Secondly, the exact locations where bees learn or if and whether they continuously learn by weighting the visual experience based on their positions and orientations is not always clear. It makes it difficult to categorise these flights accurately in learning and return flights. Additionally, homing models remain elusive on the learning mechanisms at play during the learning flights. Therefore, we believe that continuous effort must be made to understand bees' learning and homing ability. We felt it was necessary first to establish that bees could navigate back to the nest in a dense, cluttered environment. With this understanding, we are currently conducting a detailed study of the bees' learning flights in various dense environments and provide these results in a separate article.

      While we acknowledge that the bees had ample opportunities to learn the location of the nest entrance, we believe that their behaviour of entering the dense environment at a very low altitude cannot be solely explained by extended experience. It is possible that the bees could have also learned to enter at the edge of the objects or above the objects before descending within the clutter.

      General:

      The most serious weakness of the set-up is that it is spatially and visually constrained, in particular lacking a distant visual panorama, which under natural conditions is crucial for the range over which rotational image difference functions provide navigational guidance. In addition, the array of identical landmarks is not representative of natural clutter and, because it is visually repetitive, poses un-natural problems for view-based homing algorithms. This is the reason why the functions degrade so quickly from one position to the next (Figures 9-12), although it is not clear what these positions are (memory0-memory7).

      In conclusion, I do not feel that I have learnt anything useful from this experiment; it does suggest, however, that to fully appreciate and understand the homing abilities of insects, there is no alternative but to investigate these abilities in the natural conditions in which they have evolved.

      We respectfully disagree with the evaluation that our study does not provide new insights due to the controlled lab conditions. Both field and lab research are absolutely necessary and should feed each other. Dismissing the value of controlled lab experiments would overlook the contributions of previous lab-based research, which has significantly advanced our understanding of animal behaviour. It is only possible to precisely define the visual test environments under laboratory conditions and to identify the role of these components for the behaviour through targeted variation of individual components of the environment. These results should guide field-based experiments for validation.

      Our lab settings are a kind of abstraction of natural situations focusing on those aspects that are at the centre of the research question. Our approach here was that bumblebees have to find their inconspicuous nest hole in nature, which is difficult to find in often highly dense environments, and ultimately on a spatial scale in the metre range. We first wanted to find out if bumblebees can find their nest hole under the particularly challenging condition that all objects surrounding the nest hole are the same. This was not yet clear. Uniformly distributed objects may, however, also occur in nature, as seen with visually inconspicuous nest entrances of bumblebees in grass meadows, flower meadows, or forests with similar plants. We agree that the term "clutter" is not well-defined in the literature and will refer to our environment as a "dense environment."

      Despite the lack of a distant visual panorama, or also UV light, wind, or other confounding factor inherent to field work, the bees successfully located the nest position even when we shifted the dense environment within the flight arena. We used rotational-image difference functions based on snapshots taken around the nest position to predict the bees' behaviour, as this is one of the most widely accepted and computationally most parsimonious

      mechanisms for homing. This approach also proved effective in our more restricted conditions, where the bees still managed to pinpoint their home.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      This paper tackles an important question: What drives the predictability of pre-stimulus brain activity? The authors challenge the claim that "pre-onset" encoding effects in naturalistic language data have to reflect the brain predicting the upcoming word. They lay out an alternative explanation: because language has statistical structure and dependencies, the "pre-onset" effect might arise from these dependencies, instead of active prediction. The authors analyze two MEG datasets with naturalistic data.

      Strengths:

      The paper proposes a very reasonable alternative hypothesis for claims in prior work. Two independent datasets are analyzed. The analyses with the most and least predictive words are clever, and nicely complement the more naturalistic analyses.

      Weaknesses:

      I have to admit that I have a hard time understanding one conceptual aspect of the work, and a few technical aspects of the analyses are unclear to me. Conceptually, I am not clear on why stimulus dependencies need to be different from those of prediction. Yes, it is true that actively predicting an upcoming word is different from just letting the regression model pick up on stimulus dependencies, but given that humans are statistical learners, we also just pick up on stimulus dependencies, and is that different from prediction? Isn't that in some way, the definition of prediction (sensitivity to stimulus dependencies, and anticipating the most likely upcoming input(s))?

      This brings me to some of the technical points: If the encoding regression model is learning one set of regression weights, how can those reflect stimulus dependencies (or am I misunderstanding which weights are learned)? Would it help to fit regression models on for instance, every second word or something (that should get rid of stimulus dependencies, but still allow to test whether the model predicts brain activity associated with words)? Or does that miss the point? I am a bit unclear as to what the actual "problem" with the encoding model analyses is, and how the stimulus dependency bias would be evident. It would be very helpful if the authors could spell out, more explicitly, the precise predictions of how the bias would be present in the encoding model.

      We thank the reviewer for their comments and address both points.

      Conceptually, there is a key difference between encoding predictions, i.e. pre-activations of future words, versus encoding stimulus dependencies. The speech acoustics provide a useful control case: they encode the stimulus (and therefore stimulus dependencies) but do not predict. When we apply the encoding analysis to the acoustics (i.e. when we estimate the acoustics pre-onset from post-onset words), we observe the “hallmarks of prediction” – yet, clearly, the acoustics aren't "predicting" the next word.

      This reveals the methodological issue: if the brain were just passively filtering the stimulus (akin to a speech spectrogram), these "prediction hallmarks" would still appear in the acoustics encoding results, despite no actual prediction taking place. Therefore, one necessary criterion for concluding pre-activation from pre-stimulus neural encoding, is that at least the pre-stimulus encoding performance is better on neural data than on the stimulus itself. This would show that the pre-onset neural signal contains additional predictive information about the next word beyond that of the stimulus (e.g. acoustics) itself. We will make this point more prominent in the revision.

      Regarding the regression: different weights are estimated per time point in a time-resolved regression. This allows for modeling of unfolding responses over time, but also for the learning of stimulus dependencies.

      To sum up, the difference between encoding dependencies and predictions is at the core of our work. We appreciate this was not clear in the initial version and we will make this much clearer in the revision, conceptually and methodologically.

      Reviewer #2 (Public review):

      Summary:

      At a high level, the reviewers demonstrate that there is an explanation for pre-word-onset predictivity in neural responses that does not invoke a theory of predictive coding or processing. The paper does this by demonstrating that this predictivity can be explained solely as a property of the local mutual information statistics of natural language. That is, the reason that pre-word onset predictivity exists could simply boil down to the common prevalence of redundant bigram or skip-gram information in natural language.

      Strengths:

      The paper addresses a problem of significance and uses methods from modern NeuroAI encoding model literature to do so. The arguments, both around stimulus dependencies and the problems of residualization, are compellingly motivated and point out major holes in the reasoning behind several influential papers in the field, most notably Goldstein et al. This result, together with other papers that have pointed out other serious problems in this body of work, should provoke a reconsideration of papers from encoding model literature that have promoted predictive coding. The paper also brings to the forefront issues in extremely common methods like residualization that are good to raise for those who might be tempted to use or interpret these methods incorrectly.

      Weaknesses:

      The authors don't completely settle the problem of whether pre-word onset predictivity is entirely explainable by stimulus dependencies, instead opting to show why naive attempts at resolving this problem (like residualization) don't work. The paper could certainly be better if the authors had managed to fully punch a hole in this.

      We thank the reviewer for their assessment.

      We believe the limitation we highlight extends beyond the specific method of residualizing features. Rather, it points to a fundamental problem: adjusting the features (X matrix) alone cannot address stimulus dependencies that persist in the signal (y matrix), as we demonstrate by using a different signal (acoustics) that encodes no predictions. While removing dependencies from the signal would be more thorough, this would also eliminate the effect of interest. We view this as a fundamental limitation of the encoding analysis approach combined with the experimental design, rather than something that can be resolved analytically. We will perform additional analyses to test this premise and elaborate on this point in our revision.

      Reviewer #3 (Public review):

      Summary:

      The study by Schönmann et al. presents compelling analyses based on two MEG datasets, offering strong evidence that the pre-onset response observed in a highly influential study (Goldstein et al., 2022) can be attributed to stimulus dependencies, specifically, the auto-correlation in the stimuli-rather than to predictive processing in the brain. Given that both the pre-onset response and the encoding model are central to the landmark study, and that similar approaches have been adopted in several influential works, this manuscript is likely to be of high interest to the field. Overall, this study encourages more cautious interpretation of pre-onset responses in neural data, and the paper is well written and clearly structured.

      Strengths:

      (1) The authors provide clear and convincing evidence that inherent dependencies in word embeddings can lead to pre-activation of upcoming words, previously interpreted as neural predictive processing in many influential studies.

      (2) They demonstrate that dependencies across representational domains (word embeddings and acoustic features) can explain the pre-onset response, and that these effects are not eliminated by regressing out neighboring word embeddings - an approach used in prior work.

      (3) The study is based on two large MEG datasets, showing that results previously observed in ECoG data can be replicated in MEG. Moreover, the stimulus dependencies appear to be consistent across the two datasets.

      Weaknesses:

      (1) To allow a more direct comparison with Goldstein et al., the authors could consider using their publicly available dataset.

      (2) Goldstein et al. already addressed embedding dependencies and showed that their main results hold after regressing out the embedding dependencies. This may lessen the impact of the concerns about self-dependency raised here.

      (3) While this study shows that stimulus dependency can account for pre-onset responses, it remains unclear whether this fully explains them, or whether predictive processing still plays a role. The more important question is whether pre-activation remains after accounting for these confounds.

      We thank the reviewer for their comments.

      We want to address a key unclarity regarding the procedure of regressing out embedding dependencies. While Goldstein et al. showed that neural encoding results persist after their control analysis (like we did, too, in our supplementary Figure S3), this does not lessen the concern surrounding stimulus dependencies. Our analyses demonstrate that even after such residualization, the "hallmarks of prediction" remain encodable in the speech acoustics – a control system that, by definition, cannot predict upcoming words. Therefore, the hallmarks of prediction can be fully explained by stimulus dependencies. This persistence in the acoustics strengthens rather than lessens our concerns about dependencies.

      This connects to a broader methodological point: our key evidence comes from analyzing the stimulus material itself as a control system. By comparing results from encoding neural responses to those of a system that encodes the stimulus, and therefore the dependencies that cannot predict the upcoming input (like acoustics), we can establish proper criteria for concluding that the brain engages in prediction. Notably, the Goldstein dataset was not available when we conducted this research. However, for the revision we will perform additional analyses to make a more direct comparison.

      Finally, our focus was not to definitively test whether the brain predicts upcoming words, but rather to establish rigorous methodological and epistemological criteria for making such claims. We will elaborate on this crucial distinction in our revision and more prominently feature our central argument about the limitations of current evidence for neural prediction.

    1. Author response:

      The following is the authors’ response to the original reviews

      Response to public reviews:

      We thank the reviewers for their careful evaluation of our manuscript and appreciate the suggestions for improvement. We will outline our planned revisions in response to these reviews.

      Reviewer 2: “The one exception is the claim that "maintenance of respiration is the only cellular target of chalkophore mediated copper acquisition." While under the in vitro conditions tested this does appear to be the case; however, it can't be ruled out that the chalkophore is important in other situations. In particular, for maintenance of the periplasmic superoxide dismutase, SodC, which is the other M. tuberculosis enzyme known to require copper.”

      And

      Reviewer 3: “Because the phenotype of M. tuberculosis lacking chalkophores is similar, if not identical, to using Q203, an inhibitor of cytochrome bcc:aa3, the authors propose that the coppercontaining cytochrome bcc:aa3 is the only recipient of copper-uptake by chalkophores. A minor weakness of the work is that this latter conclusion is not verified under infection conditions and other copper-enzymes might still be functionally required during one or more stages of infection.

      Both comments concern the question of whether the bcc:aa3 respiratory oxidase supercomplex is the only target of chalkophore delivered copper. In culture, our experiments suggest that bcc:aa3 is the only target. The evidence for this claim is in Figure 2E and F. In 2E, we show that M. tuberculosis D_ctaD_ (a subunit of bcc:aa3) is growth impaired, copper chelation with TTM does not exacerbate that growth defect, and that a D_ctaD_D_nrp_ double mutant is no more sensitive to TTM than D_ctaD_. These data indicate that role of the chalkophore in protecting against copper deprivation is absent when the bcc:aa3 oxidase is missing. Similar results were obtained with Q203 (Figure 2F). Q203 or TTM arrest growth of M. tuberculosis D_nrp, but the combination has no additional effect, indicating that when Q203 is inhibiting the _bcc:aa3 oxidase, the chalkophore has no additional role. However, we agree with the reviewers that we cannot exclude the possibility that during infection, there is an additional target of chalkophore mediated Cu acquisition. We have added this caveat to the discussion of revised version of this manuscript. 

      Response to Reviewers Recommendations for the authors:

      Reviewing Editor Comments:

      In addition to the specific recommendations below, there was consensus that the conclusions/discussion should contextualize that the results cannot exclude that in other conditions (such as in infection), enzymes other than cytochrome bcc:aa3 receive copper from the chalkophore system.

      Reviewer #1 (Recommendations for the authors):

      (1) In the introduction, the authors mention that the nrp operon is only present in pathogenic Mtb and Mycobacterium marinum but not non-pathogenic mycobacterium. Is the nrp operon present in other pathogenic mycobacterium such as in M. leprae, M. avium or M. abscessus?

      Bhatt et al (PMID 30381350) presented an analysis of the distribution of nrp gene clusters in mycobacteria and concluded that M. bovis, M. leprae and M. canetti clearly encode nrp genes. M. marinum has been shown to have a functional chalkophore biosynthetic cluster, but the presence of this system in other mycobacteria awaits experimental validation. We have added the Bhatt reference to this sentence in the introduction. 

      (2) Figure 1A - it would be helpful if the genes were grouped and labeled as per their purpose (for example, CytBD components, bcc:aa3 components). While these are described in the text, the genes belonging to the chalkophore cluster are not defined in the text, and are thus not easily identified in the figure.

      The order of genes in the heatmap is determined by unsupervised clustering as indicated by the dendrogram to the left of the heatmap. To highlight chalkophore and CytBD genes, we have added color coding to the gene names and explained this color coding in the legend. 

      (3) Figure 2B/2C - it is interesting that complementation of ΔnrpΔcydAB with cydABCD does not rescue growth to Δnrp levels. Is there an explanation for this? 

      AND

      (4) Figure 2C - BCS is not introduced in the text for this figure nor are the results described - which seems like an oversight. It is interesting that BCS treatment does have a full rescue with cydABCD complementation, while TTM treatment does not. Is there an explanation for this?

      We thank the reviewer for raising this issue. We have attempted several different complementation constructs, including CydAB alone and different promoters, to address the partial complementation in question. However, we do not have an adequate explanation for this partial complementation. As the reviewer notes, the partial complementation is only evident with TTM, not BCS. However, we cannot speculate on the reason for this difference at present.  We have added a note to the text in the results section noting this difference. 

      (5) Figure 2F - is there a reason for the change in TTM concentrations (50 μM TTM vs 10 μM TTM)? Is the concentration for Q203 in both single treatment and combinatory tests 100nM?  

      We have clarified the 100nm Q203 concentration in the figure legend. To avoid confusion, we have removed the 50µM TTM condition from panel F because the growth inhibition phenotype of 10µM is shown in panel E and is the comparator for the combined TTM/Q203 condition in panel F. 

      (6) Figure 3A - I assume d0 = day 0, d3 = day 3. This should be defined.

      We have modified the legend to clarify these abbreviations. 

      (7) Figure 4B - as complementation of nrp for ΔnrpΔcydAB returns levels back to WT, I assume there is no attenuation with ΔcydAB alone? Clarification would be appreciated.

      The mouse phenotype of M. tuberculosis D_cydAB_ is reported here:

      https://www.pnas.org/doi/10.1073/pnas.1706139114#sec-1 and this paper is reference 22 of the paper and was noted in the discussion. 

      Reviewer #2 (Recommendations for the authors):

      In vitro conditions that require SodC could reveal a role for the chalkophore (ie., exposure to extracellular or periplasmic superoxide stress under low iron conditions). Some minor confusion exists with the terminology around the two oxidases found in M. tuberculosis. The bcc:aa3 oxidase is a supercomplex between the reductase and oxidase complexes. This point should be clarified in the introduction as the term supercomplex isn't used until later in line 194 and without definition. Referring to the bcc:aa3 supercomplex as an oxidase is fine but is sometimes confusing especially when mentioning the target of Q203 is the oxidase as it targets the reductase portion of the supercomplex.

      We thank the reviewer for this point. We have modified the text to refer to the supercomplex at first mention and modified subsequent mentions to be clearer. 

      In the RNA preparation section boxes appear in several places where spaces should be.

      We do not see these boxes so we suspect this is a conversion error of some type. 

      Reviewer #3 (Recommendations for the authors):

      The authors have very carefully performed their studies and their main conclusions are amply supported by the data. The manuscript is also very clearly written, and easily accessible to a broad audience interested in both bioinorganic chemistry and mycobacteria. I have two recommendations:

      (1) I agree that the evidence shows that chalkophores provide copper to cytochrome bcc:aa3. Under lab-culture conditions, it could well be that, when cytochrome bd is deleted or inhibited, cytochrome bcc:aa3 is rate limiting. Under lab-culture conditions, it is also clear that only the expression of a select number of enzymes is affected. However, this does not mean that cytochrome bcc:aa3 is the ONLY enzyme that receives copper from chalkophores. Thus, under infection conditions, other copper enzymes might be important. For instance, M. tuberculosis expresses a Cu-Zn superoxide dismutase. In summary, perhaps the authors would consider changing the wording of statements such as that in Figure 2E and the conclusions drawn in the discussion.

      This comment concerns the question of whether the bcc:aa3 respiratory supercomplex is the only target of chalkophore delivered copper. In culture, our experiments suggest that the supercomplex is the only target. The evidence for this claim is in Figure 2E and F. In 2E, we show that M. tuberculosis D_ctaD_ (a subunit of the bcc:aa3 supercomplex) is growth impaired, copper chelation with TTM does not exacerbate that growth defect, and that a D_ctaD_D_nrp_ double mutant is no more sensitive to TTM than D_ctaD_. These data indicate that role of the chalkophore in protecting against copper deprivation is absent when the bcc:aa3 supercomplex is missing. Similar results were obtained with Q203 (Figure 2F). Q203 or TTM arrest growth of M. tuberculosis D_nrp, but the combination has no additional effect, indicating that when Q203 is inhibiting _bcc:aa3, the chalkophore has no additional role. However, we agree with the reviewers that we cannot exclude the possibility that during infection, there is an additional target of chalkophore mediated Cu acquisition. We have added the following to the discussion: “Although chalkophore mediated protection of the bcc:aa3 supercomplex is an important virulence function, we cannot exclude the possibility that additional copper dependent enzymes use chalkophore delivered copper during infection.”

      (2) There is a difference between copper-uptake (e.g. by chalkophores) and the maturation of metallo-enzymes. A short paragraph discussing knowledge from other bacteria in this area would help understand the role chalkophores (e.g. see 10.1128/mBio.00065-18 or 10.1111/mmi.14701). This could possibly be extended with a genome analysis to check which other proteins are present in M. tuberculosis.

      We thank the reviewer for this point. We agree that our data does not distinguish between 1) a generic role for the chalkophore in copper uptake, with the ultimate candidate metalloenzyme rendered dysfunctional by copper loss, and 2) the chalkophore being an intrinsic part of the cytochrome maturation pathway and interacting directly with the target enzymes. We have added this point to the discussion but have not otherwise added the suggested full discussion of metalloenzyme maturation as we believe this discussion is beyond the scope of our data. 

      Finally, can I suggest the labels d0 and d3 are made clearer in Figure 3A (and defined in the legend).

      We have modified the legend to be clearer.

    1. Author response:

      The following is the authors’ response to the previous reviews

      We thank the editors and Reviewers 1 and 3 for their though6ul consideration of our manuscript. The present revision is submitted to address comments raised concerning rank determinations and the following sentence in the editorial assessment:

      The evidence that food-washing is deliberate is compelling, but the evidence for variable and adaptive investment depending on rank, including the fitness-relevance and ultimate evolutionary implications of the findings, is incomplete given limitations of the experimental design.

      Close reading of this sentence reveals two parallel threads. The first can be read as “…evidence for variable rank is incomplete given the limitations of the experimental design,” whereas the second can be read as “…evidence for adaptive investment and fitness is incomplete given the limitations of the experimental design.” The first alludes to a critique of our methods, while the second alludes to points of discussion unrelated to our experimental design. Unpacking this sentence is important because it casts the totality of our paper as “incomplete,” a word of consequence for early-career scholars because it prevents indexing in Web of Science.

      For clarity, we will refer to these topics as Thread 1 and Thread 2 in the following response.

      Thread 1 seems rooted in a comment made by Reviewer 1, which is reproduced below:

      I am still struck that there was an analysis of only trials where <3 individuals are present. If rank was important, I would imagine that behavior might be different in social contexts when theA, scrounging, policing, aggression, or other distractions might occur-- where rank would have effects on foraging behavior. Maybe lower rankers prioritize rapid food intake then. If rank should be related to investment in this behavior, we might expect this to be magnified (or different) in social contexts where it would affect foraging. It might just be that the data was too hard to score or process in those settings, or the analysis was limited. Additionally, I think that more robust metrics of rank from more densely sampled focal follow data would be a beJer measure, but I acknowledge the limitations in getting the ideal. Since rank is central to the interpretation of these results, I think that reduced social contexts in which rank was analyzed and the robustness of the data from which rank was calculated and analyzed are the main weaknesses of the evidence presented in this paper.

      We are grateful for this perspective of Reviewer 1, but it puts us in an uncomfortable position. We must respond rather forcefully because of its influence on the above assessment. A problem with R1’s comment is that it uses the word “foraging” (a behavior we did not study) instead of “cleaning” (the behavior we did study). Still, we can substitute the latter word with the former to get the gist of it. 

      R1 criticizes our methods as a prelude for imagining the behaviors of our study animals, a form of conjecture. R1 correctly supposes a positive relationship between the number of animals and the intensity of competition for a limited food resource, a well-known phenomenon; and, yes, the food in each trial was decidedly limited, being fixed at nine cucumber slices. But R1 incorrectly presumes rank effects on cleaning under conditions of intense food competition. When the number of monkeys participating in a trial exceeded the number of feeding stations (n = 3), we saw little or no cleaning effort, either brushing or washing. So, rank effects on cleaning are immaterial under these conditions. As our study goals were narrowly focused on detecting individual propensities, or choices, as a function of rank, we limited our analysis to trials involving three monkeys or fewer. In retrospect, we admit that we should have provided better justification for our choice of trials, so we’ve edited one of our sentences:

      Original sentence 

      Formerly lines 219-220: To minimize the potential confounding effects of dominance interactions, we analyzed trials with ≤ 3 monkeys.

      Revised sentence

      Current lines 219-224: We excluded trials from analysis if the number of participating monkeys exceeded the number of feeding stations, as these conditions produced high levels of feeding competition with scant cleaning behavior. Such conditions effectively erased individual variation in sand removal, the topic motivating our experiment. Accordingly, we analyzed trials with ≤ 3 monkeys, putting 937 food-handling bouts into the GLMM statistical models, which included data on individual rank, sex, and sand treatment.

      R1’s final criticism – “I think that more robust metrics of rank from more densely sampled focal follow data would be a better measure, but I acknowledge the limitations in getting the ideal” – seems to imply that rank data were collected during our experiment. On the contrary, we determined ranks from five years of focal follows preceding the experiment, achieving the very standard that R1 describes as ideal. The relevant text appeared on lines 165-169 in version 2.0:

      To determine the rank-order of adults, we recorded dyadic agonistic interactions and their outcomes (i.e., aggression, supplants, and silent-bared-teeth displays of submission) during 5min focal follows of individuals based on a randomized order of continuous rotation (Tan et al., 2018). In some cases, these data were supplemented with ad libitum observations. This protocol existed during five years (2013-2018) of continual observations before we conducted our experiment in July-August 2018. 

      Naturally, we were puzzled by R1’s dismissal of our methods, as well as R1’s conclusion, reached without evidence, that “[the] reduced social contexts in which rank was analyzed and the robustness of the data from which rank was calculated and analyzed are the main weaknesses of the evidence presented in this paper.” It is unsubstantiated assertation with no definition of robustness, making it difficult for anyone to objectively assess the quality of our data.

      We detect in R1’s words some unfamiliarity with the social organization of our study species, which is fair enough. To better orient readers to the dominance hierarchy of Macaca fascicularis, and to boost reader confidence in the volume and quality of our rank data, we have added several sentences to this section of the manuscript, lines 169-183:

      Macaques form multi-male multi-female (polygynandrous) social groups with individual dominance hierarchies. In M. fascicularis, the hierarchy is strictly linear and extremely steep, meaning aggression is unidirectional (de Waal, 1977; van Noordwijk and van Schaik, 2001) with profound asymmetries in outcomes for individuals of adjacent ranks (Balasubramaniam et al., 2012). Further, the dominance hierarchies of philopatric females are stable and predictable. Daughters follow the pattern of youngest ascendancy, ranking just below their mothers with few known exceptions among older sisters (de Waal, 1977; van Noordwijk and van Schaik, 1999). Taken together, these species traits are conducive to unequivocal rank determinations. 

      To determine the rank-order of adults in our study group, we recorded dyadic agonistic interactions and their outcomes (i.e., aggression, supplants, and silent-bared-teeth displays of submission) during 5-min focal follows of individuals based on a randomized order of continuous rotation (Tan et al., 2018). These data were supplemented with ad libitum observations and all rank determinations were updated monthly, and when males immigrated or emigrated. This protocol predates our experiment in July-August 2018, representing 970 hr of focal data during five years of systematic study (2013-2018). 

      Thread 2 criticizes our evidence for adaptive investment and fitness, describing it is a limitation of our experimental design. Accordingly, the totality of our experiment was classified as “incomplete.” Yet, our experiment was never designed to collect such evidence, and we make no claims of having it. Rather, we discussed potential fitness consequences to highlight the broader significance of our study, connecting it diverse bodies of literature, from evolutionary theory to paleoanthropology. Our intent was to follow the conventions of scientific writing; to put our results into conversation with the wider literature and set an agenda for future research.

      On reflection, Thread 2 seems to pivot around something as arbitrary as structure. Previously, our results and discussion were combined under a single section header (“Results and Discussion”), a stylistic choice to economize words. Our manuscript is a Short Report, which is limited to 1,500 words of main text. But this level of concision proved counterproductive. It blurred our results and discussion in the minds of readers. Indeed, Reviewer 3 described it as “misleading,” a barbed word that accomplishes the same act attributed to us. To counter this perspective, we have simply partitioned our Results (now “Experimental Results”) and Discussion to draw a sharper distinction between the two components of our paper.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Muramoto and colleagues have examined a mechanism by which the executioner caspase Drice is activated in a non-lethal context in Drosophila. The authors have comprehensively examined this in the Drosophila olfactory receptor neurons using sophisticated techniques. In particular, they had to engineer a new reporter by which non-lethal caspase activation could be detected. The authors conducted a proximity labeling experiment and identified Fasciclin 3 as a key protein in this context. While the removal of Fascilin 3 did not block non-lethal caspase activation (likely because of redundant mechanisms), its overexpression was sufficient to activate non-lethal caspase activation.

      Strengths:

      While non-lethal functions of caspases have been reported in several contexts, far less is known about the mechanisms by which caspases are activated in these non-lethal contexts. So, the topic is very timely. The overall detail of this work is impressive and the results for the most part are wellcontrolled and justified.

      Weaknesses:

      The behavioral results shown in Figure 6 need more explanation and clarification (more details below). As currently shown, the results of Figure 6 seem uninterpretable. Also, overall presentation of the Figures and description in legends can be improved.

      We sincerely thank the reviewer for their highly positive evaluation of our study, particularly from a technical perspective. We also greatly appreciate the valuable comments provided on our manuscript. In response, we have revised the manuscript with a particular focus on Figure 6, as well as the overall presentation of the figure and its description in the legends, in accordance with the reviewer’s suggestions. For further clarification, please refer to our detailed point-by-point responses provided below.

      Reviewer #2 (Public review):

      In this study, the authors investigate the role of caspases in neuronal modulation through non-lethal activation. They analyze proximal proteins of executioner caspases using a variety of techniques, including TurboID and a newly developed monitoring system based on Gal4 manipulation, called MASCaT. They demonstrate that overexpression of Fas3G promotes the non-lethal activation of caspase Dronc in olfactory receptor neurons. In addition, they investigate the regulatory mechanisms of non-lethal function of caspase by performing a comprehensive analysis of proximal proteins of executioner caspase Drice. It is important to point out that the authors use an array of techniques from western blot to behavioral experiments and also that the generated several reagents, from fly lines to antibodies.

      This is an interesting work that would appeal to readers of multiple disciplines. As a whole these findings suggest that overexpression of Fas3G enhances a non-lethal caspase activation in ORNs, providing a novel experimental model that will allow for exploration of molecular processes that facilitate caspase activation without leading to cell death.

      We sincerely thank the reviewer for their highly positive evaluation of our study, particularly from a methodological perspective. We also greatly appreciate the valuable comments provided on our manuscript. In response, we have revised the manuscript in line with the reviewer’s suggestions. For further clarification, please refer to our detailed point-by-point responses provided below.

      Reviewing Editor comments:

      I am pleased to let you know that our reviewers found the results in your paper important and the evidence compelling. There are a few minor comments and a point was raised regarding figure 6 for which further details were asked. Please see the reviewer's comments. We are looking forward to receiving an updated version of your very interesting paper.

      We are grateful to you and the reviewers for dedicating time to review our manuscript and for providing insightful comments and suggestions. We have revised our manuscript in line with the reviewers' feedback. The major revision involves clarifying the two-choice preference assay presented in Figure 6. Details of these revisions are provided in our point-by-point responses to the reviewers’ comments below. The new and extensively modified sections of text are highlighted in blue. We have introduced new panels (Figures 1D, 3D, 6B, and 6C) and made modifications to Figure 6A. The previous Figure 1D has been relocated to Figure 1–figure supplement 1B. Additionally, our detailed responses to the reviewers’ comments are also highlighted in blue within the point-by-point response section. With all concerns and suggestions from the Editor and reviewers addressed, our conclusion—that executioner caspase is proximal to Fasciclin 3 which facilitates non-lethal activation in Drosophila olfactory receptor neurons—is now more robustly supported. We are confident that our revised manuscript makes a significant contribution to the fields of caspase function and neurobiology. We remain hopeful that the reviewers will find it suitable for publication in eLife.

      Reviewer #1 (Recommendations for the authors):

      The main comment here is related to Figure 6, which needs to be better explained. First, if the results in Figure 6B and C are conducted with young flies, why is the preference index close to 0? Aren't these young flies more attracted to ACV? Second, what are the results with Dronc-RNAi and DroncDN alone? These should be shown to more accurately assess the outcome of Fas3G expression with and without Dronc inhibition. Third, if Fas3G overexpression induces non-lethal caspase activation and a behavioral change, why does Dronc inhibition enhance (and not suppress) this behavioral change?

      We sincerely thank the reviewer for the comment. We used one-week-old young flies for the two-choice preference assay. We found that 16 hours of starvation combined with 25% ACV in the trap elicited a robust attraction behavior to the vinegar (New Figure 6B). In contrast, 4 hours of starvation with 1% ACV in the trap resulted in milder attraction behavior, with the preference index value being close to 0 but still showing a positive trend (New Figure 6B). Since our hypothesis is that non-lethal caspase activation suppresses attraction behavior, and that inhibiting caspase activation could enhance attraction, we used the milder experimental condition for subsequent analyses.

      In the original manuscript, we did not test Dronc inhibition alone because caspase activation is rarely observed in young flies (as demonstrated in Figure 3C, New Figure 3D, etc), suggesting that Dronc inhibition during this stage would not affect behavior. This hypothesis is further supported by previous research showing that inhibition of caspase activity in aged flies restores attraction behavior but does has no effect in young flies (Chihara et al., 2014). To validate this hypothesis, we conducted the two-choice preference assay again, including caspase activity inhibition by Dronc<sup>DN</sup> expression alone. As expected, Dronc inhibition alone did not alter behavior in young flies (New Figure 6C).

      We also observed that Fas3G overexpression promotes a weak, though not statistically significant, enhancement in attraction behavior. Importantly, simultaneous inhibition of caspase activity further enhanced attraction behavior (New Figure 6C). These results suggest that Fas3G overexpression has a dual function: one aspect promotes attraction behavior, while the other induces non-lethal caspase activation. In this context, non-lethal caspase activation appears to counteract the behavioral response, acting as a regulatory brake. To address the reviewer’s comments comprehensively, we included the New Figure 6B and replaced the original Figure 6B and C with New Figure 6C. Additionally, we revised the manuscript text as follows:

      Using a two-choice preference assay with ACV (Figure 6A), we found that 16 hours of starvation combined with 25% ACV in the trap elicited a robust attraction behavior to the vinegar (Figure 6B). In contrast, 4 hours of starvation with 1% ACV in the trap resulted in milder attraction behavior, with the preference index value being close to 0 but still showing a positive trend (Figure 6B). Under the milder experimental condition, we first confirmed that inhibition of caspase activity through expressing Dronc<sup>DN</sup> didn’t affect attraction behavior in young adult (Figure 6C), consistent with a previous report (Chihara et al., 2014).We then observed that the overexpression of Fas3G, which activates caspases, did not impair attraction behavior. Instead, it rather appeared to enhance the tendency for attraction behavior (Figure 6C), suggesting that Fas3G promotes attraction behavior. Finally, we found that inhibiting Fas3G overexpression-facilitated non-lethal caspase activation by expressing Dronc<sup>DN</sup> strongly promoted attraction to ACV (Figure 6C). Overall, these results suggest that Fas3G overexpression has a dual function: it enhances attraction behavior while also triggering non-lethal caspase activation, which counteracts the behavioral response, functioning as a regulatory brake without causing cell death.

      Other minor comments are below:

      The authors should clarify that while they refer to their caspases reporters as "non-lethal caspase reporters", these are caspase reporters in general and can report both lethal and non-lethal caspase activation. Of course, the only surviving cells are those that experience non-lethal caspase activation.

      We thank the reviewer for pointing this out. This reporter can monitor caspase activation with high sensitivity only if the cell is capable of transcribing and translating the reporter proteins following cleavage of the probe, most likely in living cells. However, as mentioned, using the term “non-lethal reporter” is not accurate, as additional experiments are required to determine whether caspase activation leads to cell death. Therefore, we removed the term “non-lethal” and referred to this reporter simply as a highly sensitive caspase reporter in the revised manuscript.

      Some of the figure panels could be better described in the legends (e.g. Figure 1E, 1F, 4E, 4F).

      We thank the reviewer for the comment. We have included additional explanations in the figure legends throughout the manuscript.

      In Figure 3C, the OL and AL regions should be marked in the figure as done in Figure 1C.

      We thank the reviewer for the comment. We have marked OL and AL regions in Figure 3C and Figure 2A as in Figure 1C.

      In Figures 4A and B, the authors should rearrange the order of the x-axis to reflect the order that appears in the text (Dronc first).

      We thank the reviewer for the comment. We have rearranged the order of labels on the X-axis to reflect the order that appears in the text.

      In Figure 6B, do the colors imply anything? If so, it should be explained. 

      We thank the reviewer for pointing this out. We intended to use the colors where the light blue bars represent Fas3G overexpression, while the red dots indicate caspase-activated conditions. In the New Figure 6C, we used light blue dots for Fas3G overexpression and red bars for caspase-activated conditions. We have added an explanation in the figure legend. In addition, we have removed the colors in Figure 4B and have added an explanation in the figure legend in Figure 4D.  

      Reviewer #2 (Recommendations for the authors):

      (1) For the methods section make a table for the lines, the way they are listed is not the most easy to read.

      We thank the reviewer for the comment. We have listed the fly strains used in this study in Table S3.

      (2) Lines 420 to 573, not sure why this is here, this information should be in the figure or figure legend, or make a table if necessary.

      We thank the reviewer for the comment. We have listed the detailed genotypes corresponding to each figure in Table S4.

      (3) Blocking with donkey serum, do you get better results than bovine?

      We have not conducted tests with bovine serum for immunohistochemistry. Donkey serum was used throughout the manuscript.

      (4) The Methods section is very thorough and complete but I recommend the use of tables to organize some of the reagents used.

      We thank the reviewer for the comment. We have listed the fly strains used in this study in Table S3 and the detailed genotypes corresponding to each figure in Table S4.

      (5) Line 647 spells out LC-MS/MS.

      We thank the reviewer for pointing this out. We have provided the full spelling as “liquidchromatography-tandem mass spectrometry”.

      (6) Line 808 spells out ACV (apple cider vinegar) and MQ (MilliQ water).

      We thank the reviewer for pointing this out. We have provided the full spelling as suggested.

      (7) Figure 1D. Why do you use only females? 

      We thank the reviewer for pointing this out. In the original manuscript, we analyzed female flies by crossing each Gal4 strain with UAS-Drice-RNAi; Drice::V5::TurboID virgin females. In this case, because Pebbled-Gal4 is located on X chromosome, we could only use female flies for the analysis. To address this, we examined the expression pattern in males flies by crossing each Gal4 virgin female with UAS-Drice-RNAi; Drice::V5::TurboID males. As expected, Drice expression is also mostly depleted when using the ORN-specific Gal4 driver, Pebbled-Gal4, suggesting that Drice expression is predominantly observed in ORNs in males as well. We have added New Figure 1D to present the male data. The original Figure 1D, which presents female data, has been relocated to Figure 1–figure supplement 1B.

      (8) Figure 1D. Be clear about the LN driver used here in the figure.

      We thank the reviewer for pointing this out. We used Orb<sup>0449</sup>-Gal4 driver (#63325, Bloomington Drosophila Stock Center), which has been previously characterized as an LN-specific Gal4 driver (Wu et al., 2017). Accordingly, we have revised “LN-Gal4” to “Orb<sup>0449</sup>-Gal4” throughout the manuscript.

      (9) Figure 1 and Supplementary Figure 1 images are very good. I would recommend the use of a different color palette, to help visualization for colorblind readers (such as this reviewer).

      We apologize for any inconvenience caused. We chose the green/magenta color pair because these are complementary colors, which generally provide better contrast compared to other color pairs. Therefore, we have decided to continue using this pair. To enhance readability, we have intensified the magenta signal in the New Figure 1D and Figure 1–figure supplement 1B. We retained the original magenta signal levels in Figure 1C and Figure 1–figure supplement 1A to avoid oversaturation. Instead, we have kept the Streptavidin-only signal images alongside the color merged images for clarity. We hope these adjustments improve the visualization and help you better interpret the figures.

      (10) Based on Supplementary Figure 1 and based on the fact that Figures 1B and 1C use males, why not used also males for Figure 1D?

      Please refer to our reply to comment #7. We have now included the results for males in the New Figure 1D, which show a similar expression pattern to that observed in females. The results for females originally shown in Figure 1D have been relocated to Figure 1–figure supplement 1B.

      (11) Why were the old versus young flies used for Figure 3 raised at 29C? Why not let the animals age at 25C? The use of 29C throughout the manuscript is not clear.

      We thank the reviewer for pointing this out. Most of the UAS fly strains used in this study, including a Fas3G overexpression line, are UASz lines, which exhibit relatively low expression levels compared to UASt lines (DeLuca and Spradling, 2018). Since the Gal4/UAS system is temperature-dependent (Duffy, 2002), we performed most of the experiments at 29°C to enhance gene expression.

      For the aging experiments, we chose to rear flies at 29°C because higher temperatures accelerate aging including neuronal aging (Okenve-Ramos et al., 2024), allowing for faster experimentation, and 29°C is within the ecologically relevant range of temperatures for Drosophila melanogaster (SotoYéber et al., 2018). Additionally, we confirmed that a subset of olfactory receptor neurons undergo aging-dependent caspase activation at both 29°C and 25°C, as shown in New Figure 3D.

      (12) Why not use an Or42b specific GAL 4 for the aging experiment? What are the odorants that are detected by this ORN? Are any of the odorants behaviorally relevant compounds?

      We thank the reviewer for pointing this out. While the exact odorant detected by Or42b neurons has not been fully determined, these neurons innervate the DM1 region in the antennal lobe, which is activated by ACV. Additionally, Or42b neurons have been shown to be required for attraction behavior to ACV (Semmelhack and Wang, 2009), supporting the relevance of ACV for the behavioral experiment.   We used Or42b-Gal4 to confirm that Or42b neurons undergo aging-dependent caspase activation, which is detectable using the MASCaT system (New Figure 3D). Furthermore, we verified that these neurons exhibit aging-dependent caspase activation at both 25°C and 29°C (New Figure 3D).

      (13) Make the panel lettering in all the figures bigger or bold.

      We thank the reviewer for pointing this out. We have increased the size of the panel lettering and made it bold throughout the figures to improve the readability.

      (14) Line 806. MilliQ water.

      We thank the reviewer for pointing this out. We have ensured that “MilliQ water” is consistently spelled this way throughout the manuscript.

      (15) Figure 6. The authors need to be more clear on the experimental conditions. At what time of the day was this experiment performed? Was the experiment run in DD? Were the flies young or old?

      We thank the reviewer for pointing this out. We performed the assay using one-week-old young flies under constant dark conditions during both the starvation period and the assay. We have added a detailed explanation in the Methods section. For clarity, we have also revised Figure 6A to provide a more detailed explanation of the experimental setup.

      References

      Chihara T, Kitabayashi A, Morimoto M, Takeuchi K-I, Masuyama K, Tonoki A, Davis RL, Wang JW, Miura M. 2014. Caspase inhibition in select olfactory neurons restores innate attraction behavior in aged Drosophila. PLoS Genet 10:e1004437.

      DeLuca SZ, Spradling AC. 2018. Efficient expression of genes in the Drosophila germline using a UAS promoter free of interference by Hsp70 piRNAs. Genetics 209:381–387.

      Duffy JB. 2002. GAL4 system in Drosophila: a fly geneticist’s Swiss army knife. Genesis 34:1–15.

      Okenve-Ramos P, Gosling R, Chojnowska-Monga M, Gupta K, Shields S, Alhadyian H, Collie C, Gregory E, Sanchez-Soriano N. 2024. Neuronal ageing is promoted by the decay of the microtubule cytoskeleton. PLoS Biol 22:e3002504.

      Semmelhack JL, Wang JW. 2009. Select Drosophila glomeruli mediate innate olfactory attraction and aversion. Nature 459:218–223.

      Soto-Yéber L, Soto-Ortiz J, Godoy P, Godoy-Herrera R. 2018. The behavior of adult Drosophila in the wild. PLoS One 13:e0209917.

      Wu B, Li J, Chou Y-H, Luginbuhl D, Luo L. 2017. Fibroblast growth factor signaling instructs ensheathing glia wrapping of Drosophila olfactory glomeruli. Proc Natl Acad Sci U S A 114:7505–7512.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer 1 (Public Review):

      Summary:

      In this paper, the authors aimed to test the ability of bumblebees to use bird-view and ground-view for homing in cluttered landscapes. Using modelling and behavioural experiments, the authors showed that bumblebees rely most on ground-views for homing.

      Strengths:

      The behavioural experiments are well-designed, and the statistical analyses are appropriate for the data presented.

      Weaknesses:

      Views of animals are from a rather small catchment area.

      Missing a discussion on why image difference functions were sufficient to explain homing in wasps (Murray and Zeil 2017).

      The artificial habitat is not really 'cluttered' since landmarks are quite uniform, making it difficult to infer ecological relevance.

      Thank you for your thorough evaluation of our study. We aimed to investigate local homing behaviour on a small spatial scale, which is ecologically relevant given that the entrance of bumblebee nests is often inconspicuously hidden within the vegetation. This requires bees to locate their nest hole within a confined area. While many studies have focused on larger spatial scales using radar tracking (e.g. Capaldi et al. 2000; Osborne et al. 2013; Woodgate et al. 2016), there is limited understanding of the mechanisms behind local homing, especially in dense environments as we propose here.

      We appreciate your suggestion to include the study by Murray and Zeil (2017) in our discussion. Their research explored the catchment areas of image difference functions on a larger spatial scale with a cubic volume of 5m x 5m x 5m. Aligned with their results, we found that image difference functions pointed towards the location of the objects surrounding the nest when the images were taken above the objects. However, within the clutter, i.e. the dense set of objects surrounding the nest, the model did not perform well in pinpointing the nest position.

      See the new discussion at lines 192-197

      We agree with your comment about the term "clutter". Therefore, we referred to our landmark arrangement as a "dense environment" instead. Uniformly distributed objects do indeed occur in nature, as seen in grasslands, flower meadows, or forests populated with similar plants.

      See line 20 and we changed the wording throughout the manuscript and figures.

      Reviewer 1 (Recommendations): 

      The manuscript is well written, nicely designed experiments and well illustrated. I have a few comments below.

      It would be useful to discuss known data of learning flights in bumblebees, and the height or catchment area of their flights. This will allow the reader to compare your exp design to the natural learning flights.

      In our study, we first focused on demonstrating the ability to solve a homing task in a dense environment. As we observed the bees returning within the dense environment and not from above it (contrary to the model predictions), we investigated whether they flew above it during their first flights. The bees did indeed fly above, demonstrating their ability to ascend and descend within the constellation of objects (see Supplementary Material Fig. 22).

      In nature, the learning flight of bumblebees may cover several decametres, with the loops performed during these flights increasing with flight time (e.g. Osborne et al. 2013; Woodgate et al. 2016). A similar pattern can be observed on a smaller spatial scale (e.g. Philippides et al. 2013). Similar to the loops that extend over time, the bees gradually gain altitude (Lobecke et al., 2018). However, these observations come from studies where few conspicuous objects surround the nest entrance.

      Although our study  focussed on the performance in goal finding in cluttered environments, we now also address the issue of learning flights in the discussion, as learning flights are the scaffolding of visual learning. We have already conducted several learning flight experiments to fill the knowledge gap mentioned above. These will allow us in a forthcoming paper to compare learning flights in this environment with the existing literature (Sonntag et al., 2024).

      We added a reference to this in the discussion (lines 218-219 and 269-272)

      Include bumblebee in the title rather than 'bees'.

      We adapted the title accordingly:

      “Switching perspective: Comparing ground-level and bird’s-eye views for bumblebees navigating dense environments”

      I found switching between bird-views and frog-views to explain bee-views slightly tricky to read. Why not use 'ground-views', which you already have in the title?

      We agree and adapted the wording in the manuscript according to this suggestion.

      I am not convinced there is evidence here to suggest the bees do not use view-based navigation, because of the following: In L66: unclear what were the views centred around, I assume it is the nest. Is 45cm above the ground the typical height gained by bumblebees during learning flight? The clutter seems to be used more as an obstacle that they are detouring to reach the goal, isn't it?

      Based on many previous studies, view-based navigation can be assumed to be one of the plausible mechanisms bees use for homing (Cartwright & Collett, 1987; Doussot et al., 2020; Lehrer & Collett, 1994; Philippides et al., 2013; Zeil, 2022). In our tests, when the dense environment was shifted to a different position in the flight arena, almost no bees searched at the real location of the nest entrance but at the fictive new location within the dense environment, indicating that the bees assumed  the nest to be located within the dense environment, and therefore  that vision played a crucial role for homing. We thus never meant that the bees were not using view-based navigation. We clarified this point in the revised manuscript.

      See lines 247-248, 250-259, added visual memory to schematic in Fig. 6

      In our model simulations, the memorised snapshots were centred around the nest. However, we found that a multi-snapshot model could not explain the behaviour of the bees. This led us to suggest that bees likely employ acombination of multiple mechanisms for navigation.

      We refined paragraph about possible alternative homing mechanisms. See lines  218-263

      The height of learning flights has not been extensively investigated in previous studies, and typical heights are not well-documented in the literature. However, from our observations of the first outbound flights of bumblebees within the dense environment, we noted that they quickly increased their altitude and then flew above the objects. Since the objects had a height of 0.3 metres, we chose 0.45 metres as a height above the objects for our study.

      Furthermore, the nest is positioned within the arrangement of objects, making it a target the bees must actively find rather than detour around.

      I think a discussion to contrast your findings with Murray and Zeil 2017 will be useful. It was unclear to me whether the flight arena had UV availability, if it didn't, this could be a reason for the difference.

      We referred to this study in the discussion of the revised paper (see our response to the public review). Lines 192-197

      As in most lab studies on local homing, the bees did not have UV light available in the arena. Even without this, they were successful in finding their nest position during the tests. We clarified that in the revised manuscript. See line 334-336

      Figure 2A, can you add a scale bar?

      We added a scale bar to the figure showing the dimensions of the arena. See Fig. 2

      The citation of figure orders is slightly off. We have Figure 5 after Figure 2, without citing Figures 3 and 4. Similarly for a few others.

      We carefully checked the order of cited figures and adapted them.

      Reviewer 2 (Public Review):

      Summary:

      In a 1.5m diameter, 0.8m high circular arena bumblebees were accustomed to exiting the entrance to their nest on the floor surrounded by an array of identical cylindrical landmarks and to forage in an adjacent compartment which they could reach through an exit tube in the arena wall at a height of 28cm. The movements of one group of bees were restricted to a height of 30cm, the height of the landmark array, while the other group was able to move up to heights of 80cm, thus being able to see the landmark array from above.

      During one series of tests, the flights of bees returning from the foraging compartment were recorded as they tried to reach the nest entrance on the floor of the arena with the landmark array shifted to various positions away from the true nest entrance location. The results of these tests showed that the bees searched for the net entrance in the location that was defined by the landmark array.

      In a second series of tests, access to the landmark array was prevented from the side, but not from the top, by a transparent screen surrounding the landmark array. These tests showed that the bees of both groups rarely entered the array from above, but kept trying to enter it from the side.

      The authors express surprise at this result because modelling the navigational information supplied by panoramic snapshots in this arena had indicated that the most robust information about the location of the nest entrance within the landmark array was supplied by views of the array from above, leading to the following strong conclusions: line 51: "Snapshot models perform best with bird's eye views"; line 188: "Overall, our model analysis could show that snapshot models are not able to find home with views within a cluttered environment but only with views from above it."; line 231: "Our study underscores the limitations inherent in snapshot models, revealing their inability to provide precise positional estimates within densely cluttered environments, especially when compared to the navigational abilities of bees using frog's-eye views."

      Strengths:

      The experimental set-up allows for the recording of flight behaviour in bees, in great spatial and temporal detail. In principle, it also allows for the reconstruction of the visual information available to the bees throughout the arena.

      The experimental set-up allows for the recording of flight behaviour in bees, in great spatial and temporal detail. In principle, it also allows for the reconstruction of the visual information available to the bees throughout the arena.

      Weaknesses:

      Modelling:

      Modelling left out information potentially available to the bees from the arena wall and in particular from the top edge of the arena and cues such as cameras outside the arena. For instance, modelled IDF gradients within the landmark array degrade so rapidly in this environment, because distant visual features, which are available to bees, are lacking in the modelling. Modelling furthermore did not consider catchment volumes, but only horizontal slices through these volumes.

      When we started modelling the bees’ homing based on image-matching, we included the arena wall. However, the model simulations pointed only coarsely towards the dense environment but not toward the nest position. We hypothesised that the arena wall and object location created ambiguity. Doussot et al. (2020) showed that such a model can yield two different homing locations when distant and local cues are independently moved. Therefore, we reduced the complexity of the environment by concentrating on the visual features, which were moved between training and testing (neither the camera nor the wall were moved between training and test). We acknowledge that this information should have been provided to substantiate our reasoning. As such, we included model results with the arena wall in the supplements of the revised paper. See lines 290-293, Figures S17-21

      We agree that the catchment volumes would provide quantitatively more detailed information as catchment slices. Nevertheless, since our goal was  to investigate if bees would use ground views or bird's eye views to home in a dense environment, catchment slices, which provide qualitatively similar information as catchment volumes, are sufficient to predict whether ground or bird's-eye views perform better in leading to the nest. Therefore, we did not include further computations of catchment volumes. (ll. 296-297)

      Behavioural analysis:

      The full potential of the set-up was not used to understand how the bees' navigation behaviour develops over time in this arena and what opportunities the bees have had to learn the location of the nest entrance during repeated learning flights and return flights.

      Without a detailed analysis of the bees' behaviour during 'training', including learning flights and return flights, it is very hard to follow the authors' conclusions. The behaviour that is observed in the tests may be the result of the bees' extended experience shuttling between the nest and the entry to the foraging arena at 28cm height in the arena wall. For instance, it would have been important to see the return flights of bees following the learning flights shown in Figure 17. Basically, both groups of bees (constrained to fly below the height of landmarks (F) or throughout the height of the arena (B)) had ample opportunities to learn that the nest entrance lies on the floor of the landmark array. The only reason why B-bees may not have entered the array from above when access from the side was prevented, may simply be that bumblebees, because they bumble, find it hard to perform a hovering descent into the array.

      A prerequisite for studying the learning flight in a given environment is showing that the bees manage to return to their home. Here, our primary goal was to demonstrate this within a dense environment. While we understand that a detailed analysis of the learning and return flights would be valuable, we feel this is outside the scope of this particular study.

      Multi-snapshot models have been repeatedly shown to be sufficient to explain the homing behaviour in natural as well as artificial environments(Baddeley et al., 2012; Dittmar et al., 2010; Doussot et al., 2020; Möller, 2012; Wystrach et al., 2011, 2013; Zeil, 2012). A model can not only be used to replicate but also to predict a given outcome and shape the design of experiments. Here, we used the models to shape the experimental design, as it does not require the entire history of the bee's trajectory to be tested and provides interesting insight into homing in diverse environments.

      Since we observed behavioural responses different from the one suggested by the models, it becomes interesting to look at the flight history. If we had found an alignment between the model and the behaviour, looking at thehistory would have become much less interesting. Thus our results raise an interest in looking at the entire flight history, which will require not only effort on the recording procedure, but as well conceptually. At the moment the underlying mechanisms of learning during outbound, inbound, exploration, or orientation flight remains evasive and therefore difficult to test a hypothesis. A detailed description of the flight during the entire bee history would enable us to speculate alternative models to the one tested in our study, but would remain limited in testing those.

      While we acknowledge that the bees had ample opportunities to learn the location of the nest entrance, we believe that their behaviour of entering the dense environment at a very low altitude cannot be solely explained by extended experience. It is possible that the bees could have also learned to enter at the edge of the objects or above the objects before descending within the dense environment.

      General:

      The most serious weakness of the set-up is that it is spatially and visually constrained, in particular lacking a distant visual panorama, which under natural conditions is crucial for the range over which rotational image difference functions provide navigational guidance. In addition, the array of identical landmarks is not representative of natural clutter and, because it is visually repetitive, poses un-natural problems for view-based homing algorithms. This is the reason why the functions degrade so quickly from one position to the next (Figures 9-12), although it is not clear what these positions are (memory0-memory7).

      In conclusion, I do not feel that I have learnt anything useful from this experiment; it does suggest, however, that to fully appreciate and understand the homing abilities of insects, there is no alternative but to investigate these abilities in the natural conditions in which they have evolved.

      We respectfully disagree with the evaluation that our study does not provide new insights due to the controlled laboratory conditions. Both field and laboratory research are necessary and should complement each other. Dismissing the value of controlled lab experiments would overlook the contributions of previous lab-based research, which has significantly advanced our understanding of animal behaviour. It is only possible to precisely define the visual test environments under laboratory conditions and to identify the role of the components of the environment for the behaviour through targeted variation of them. These results yield precious information to then guide future field-based experiments for validation.

      Our laboratory settings are a kind of abstraction of natural situations focusing on those aspects that are at the centre of the research question. Our approach here was based on the knowledge that bumblebees have to find their inconspicuous nest hole in nature, which is difficult to find in often highly dense environments, and ultimately on a spatial scale in the metre range. We first wanted to find out if bumblebees can find their nest hole under the particularly challenging condition that all objects surrounding the nest hole are the same. This was not yet clear. Uniformly distributed objects may, however, also occur in nature, as seen with visually inconspicuous nest entrances of bumblebees in grass meadows, flower meadows, or forests with similar plants. We agree that the term "clutter" is not well-defined in the literature and now refer to the  environment as a "dense environment."

      We changed the wording throughout the manuscript and figures.

      Despite the lack of a distant visual panorama, or also UV light, wind, or other confounding factors inherent to field work conditions, the bees successfully located the nest position even when we shifted the dense environment within the flight arena. We used rotational-image difference functions based on snapshots taken around the nest position to predict the bees' behaviour, as this is one of the most widely accepted and computationally most parsimonious assessments of catchment areas in the context of local homing. This approach also proved effective in our more restricted conditions, where the bees still managed to pinpoint their home.

      Reviewer 2 (Recommendations):

      (1) Clarify what is meant by modelling panoramic images at 1cm intervals (only?) along the x-axis of the arena.

      The panoramic images were taken along a grid with 0.5cm steps within the dense environment and 1cm steps in the rest of the arena. A previous study (Doussot et al., 2020) showed successful homing of multi-snapshot models in an environment of similar scale with a grid with 2cm steps. Therefore, we think that our scaling is sufficiently fine. We apologise for the missing information in the method section and added it to the revised manuscript. See lines 286-287

      (2) In Figures 9-12 what are the memory0 to memory7 locations and reference image orientations? Explain clearly which image comparisons generated the rotIDFs shown.

      Memory 0 to memory 7 are examples of the eight memorised snapshots, which are aligned in the nest direction and taken around the nest. In the rotIDFs shown, we took memory 0 as a reference image, and compared the 7 others by rotating them against memory 0. We clarified that in the revised manuscript.

      See revised figure caption in Fig. S9 – 16

      (3) Figure 9 seems to compare 'bird's-eye', not 'frog's-eye' views.

      We apologise for that mistake and carefully double-checked the figure caption.

      See revised figure caption Fig. S9

      (4) Why do you need to invoke a PI vector (Figure 6) to explain your results?

      Since the bees were able to home in the dense environment without entering the object arrangement from above but from the side, image matching alone could not explain the bees’ behaviour. Therefore, we suggest, as an hypothesis for future studies, a combination of mechanisms such as a home vector. Other alternatives, perhaps without requiring a PI vector, may explain the bees’ behaviour, and we will welcome any future contributions from the scientific community.

      References

      Baddeley, B., Graham, P., Husbands, P., & Philippides, A. (2012). A Model of Ant Route Navigation Driven by Scene Familiarity. PLoS Computational Biology,8(1), e1002336. https://doi.org/10.1371/journal.pcbi.1002336

      Capaldi, E. A., Smith, A. D., Osborne, J. L., Farris, S. M., Reynolds, D. R., Edwards, A. S., Martin, A., Robinson, G. E., Poppy, G. M., & Riley, J. R. (2000).

      Ontogeny of orientation flight in the honeybee revealed by harmonic radar. Nature, 403. https://doi.org/10.1038/35000564

      Cartwright, B. A., & Collett, T. S. (1987). Landmark maps for honeybees. Biological Cybernetics, 57(1), 85–93. https://doi.org/10.1007/BF00318718

      Dittmar, L., Stürzl, W., Baird, E., Boeddeker, N., & Egelhaaf, M. (2010). Goal seeking in honeybees: Matching of optic flow snapshots? Journal of Experimental Biology, 213(17), 2913–2923. https://doi.org/10.1242/jeb.043737

      Doussot, C., Bertrand, O. J. N., & Egelhaaf, M. (2020). Visually guided homing of bumblebees in ambiguous situations: A behavioural and modelling study. PLoS Computational Biology, 16(10). https://doi.org/10.1371/journal.pcbi.1008272

      Lehrer, M., & Collett, T. S. (1994). Approaching and departing bees learn different cues to the distance of a landmark. Journal of Comparative Physiology A, 175(2), 171–177. https://doi.org/10.1007/BF00215113

      Lobecke, A., Kern, R., & Egelhaaf, M. (2018). Taking a goal-centred dynamic snapshot as a possibility for local homing in initially naïve bumblebees. Journal of Experimental Biology, 221(2), jeb168674. https://doi.org/10.1242/jeb.168674

      Möller, R. (2012). A model of ant navigation based on visual prediction. Journal of Theoretical Biology, 305, 118–130. https://doi.org/10.1016/j.jtbi.2012.04.022

      Murray, T., & Zeil, J. (2017). Quantifying navigational information: The catchment volumes of panoramic snapshots in outdoor scenes. PLOS ONE, 12(10), e0187226. https://doi.org/10.1371/journal.pone.0187226

      Osborne, J. L., Smith, A., Clark, S. J., Reynolds, D. R., Barron, M. C., Lim, K. S., & Reynolds, A. M. (2013). The ontogeny of bumblebee flight trajectories: From Naïve explorers to experienced foragers. PLoS ONE, 8(11). https://doi.org/10.1371/journal.pone.0078681

      Philippides, A., de Ibarra, N. H., Riabinina, O., & Collett, T. S. (2013). Bumblebee calligraphy: The design and control of flight motifs in the learning and return flights of Bombus terrestris. Journal of Experimental Biology, 216(6), 1093–1104. https://doi.org/10.1242/jeb.081455

      Sonntag, A., Lihoreau, M., Bertrand, O. J. N., & Egelhaaf, M. (2024). Bumblebees increase their learning flight altitude in dense environments. bioRxiv, 2024.10.14.618154. https://doi.org/10.1101/2024.10.14.618154

      Woodgate, J. L., Makinson, J. C., Lim, K. S., Reynolds, A. M., & Chittka, L. (2016). Life-long radar tracking of bumblebees. PLoS ONE, 11(8). https://doi.org/10.1371/journal.pone.0160333

      Wystrach, A., Mangan, M., Philippides, A., & Graham, P. (2013). Snapshots in ants? New interpretations of paradigmatic experiments. Journal of Experimental Biology, 216(10), 1766–1770. https://doi.org/10.1242/jeb.082941

      Wystrach, A., Schwarz, S., Schultheiss, P., Beugnon, G., & Cheng, K. (2011). Views, landmarks, and routes: How do desert ants negotiate an obstacle course? Journal of Comparative Physiology A: Neuroethology, Sensory, Neural, and Behavioral Physiology, 197(2), 167–179. https://doi.org/10.1007/s00359-010-0597-2

      Zeil, J. (2012). Visual homing: An insect perspective. Current Opinion in Neurobiology, 22(2), 285–293. https://doi.org/10.1016/j.conb.2011.12.008

      Zeil, J. (2022). Visual navigation: Properties, acquisition and use of views. Journal of Comparative Physiology A. https://doi.org/10.1007/s00359-022-01599-2

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      The manuscript titled "Household clustering and seasonal genetic  variation of Plasmodium falciparum at the community-level in The Gambia" presents a valuable genetic spatio-temporal analysis of  malaria-infected individuals from four villages in The Gambia, covering  the period between December 2014 and May 2017. The majority of samples  were analyzed using a SNP barcode with the Spotmalaria panel, with a  subset validated through WGS. Identity-by-descent (IBD) was calculated  as a measure of genetic relatedness and spatio-temporal patterns of the  proportion of highly related infections were investigated. Related  clusters were detected at the household level, but only within a short  time period.

      Strengths:

      This study offers a valuable dataset, particularly due to its  longitudinal design and the inclusion of asymptomatic cases. The  laboratory analysis using the Spotmalaria platform combined and  supplemented with WGS is solid, and the authors show a linear  correlation between the IBD values determined with both methods,  although other studies have reported that at least 200 SNPs are required for IBD analysis. Data-analysis pipelines were created for (1) variant  filtering for WGS and subsequent IBD analysis, and (2) creating a  consensus barcode from the spot malaria panel and WGS data and  subsequent SNP filtering and IBD analysis.

      Weaknesses:

      Further refining the data could enhance its impact on both the scientific community and malaria control efforts in The Gambia.

      (1) The manuscript would benefit from improved clarity and better  explanation of results to help readers follow more easily. Despite  familiarity with genotyping, WGS, and IBD analysis, I found myself  needing to reread sections. While the figures are generally clear and  well-presented, the text could be more digestible. The aims and  objectives need clearer articulation, especially regarding the rationale for using both SNP barcode and WGS (is it to validate the approach with the barcode, or is it to have less missing data?). In several analyses, the purpose is not immediately obvious and could be clarified.

      The text of the manuscript has now been thoroughly revised. But please let us know if a specific section remains unclear.

      (2) Some key results are only mentioned briefly in the text without  corresponding figures or tables in the main manuscript, referring only  to supplementary figures, which are usually meant for additional detail, but not main results. For example, data on drug resistance markers  should be included in a table or figure in the main manuscript.

      We agree with the reviewer suggesting to move the prevalence of drug resistance markers from supplementary figures (previously Figure S8) to the main manuscript (now Figure 5). If other Figure/Table should be moved to the main manuscript please let us know.

      (3) The study uses samples from 2 different studies. While these are  conducted in the same villages, their study design is not the same,  which should be addressed in the interpretation and discussion of the  results. Between Dec 2014 and Sept 2016, sampling was conducted only in 2 villages and at less frequent intervals than between Oct 2016 to May  2017. The authors should assess how this might have impacted their  temporal analysis and conclusions drawn. In addition, it should be  clarified why and for exactly in which analysis the samples from Dec  2016 - May 2017 were excluded as this is a large proportion of your  samples.

      We have clarified which set of samples was used in our Results (Lines 293-295, 316-319). While two villages were recruited halfway through the study, two villages (J and K, Figure 1C) consistently provided data for each transmission season. Importantly, our temporal analysis accounts for these differences by grouping paired barcodes based on their respective locations (Figure 3B). Despite variations in sampling frequency, we still observe a clear overall decline in relatedness between the ‘0-2 months’ and ‘2-5 months’ groups, both of which include barcodes from all four villages.

      (4) Based on which criteria were samples selected for WGS? Did the  spatiotemporal spread of the WGS samples match the rest of the genotyped samples? I.e. were random samples selected from all times and places,  or was it samples from specific times/places selected for WGS?

      All P. falciparum positive samples were sent for genotyping and whole genome sequencing, ensuring no selection bias. However, only samples with sufficient parasite DNA were successfully sequenced. We have updated the text (Line 129-130) and added a supplementary figure (Figure S4) to show the sample collection broken down by type of data (barcode or genome). High quality genomes are distributed across all time points.

      (5) The manuscript would benefit from additional detail in the methods section.

      Please see our response in the section “Recommendation for the authors”.

      (6) Since the authors only do the genotype replacement and build  consensus barcode for 199 samples, there is a bias between the samples  with consensus barcode and those with only the genotyping barcode. How  did this impact the analysis?

      While we acknowledge the potential for bias between samples with a consensus barcode (based on WGS) and those with genotyping-only barcodes, its impact is minimal. WGS does indeed produce a more accurate barcode compared to SNP genotyping, but any errors in the genotyping barcodes were mitigated by excluding loci that systematically mismatched with WGS data (see Figure S3). Additionally, the use of WGS improved the accuracy of 51 % (216/425) of barcodes, which strengthens the overall quality and validity of our analysis.

      (7) The linear correlation between IBD-values of barcode vs genome is  clear. However, since you do not use absolute values of IBD, but a  classification of related (>=0.5 IBD) vs. unrelated (<0.5), it  would be good to assess the agreement of this classification between the 2 barcodes. In Figure S6 there seem to be quite some samples that would be classified as unrelated by the consensus barcode, while they have  IBD>0.5 in the Genome-IBD; in other words, the barcode seems to be  underestimating relatedness.

      a. How sensitive is this correlation to the nr of SNPs in the barcode?

      We measured the agreement between the two classifications using specificity (0.997), sensitivity (0.841) and precision (0.843) described in the legend of Figure S8. To further demonstrate the good agreement between the two methods, we calculated a Cohen’s kappa value of 0.839 (Lines 226, 290), indicative of a strong agreement (McHugh 2012). As expected, the correlation between IBD values obtained by both methods improves (higher Cohen’s kappa and R<sup>2</sup>) as the cutoff for the minimal number of comparable and informative loci per barcode pair is raised (data not shown).

      (8) With the sole focus on IBD, a measure of genetic relatedness, some of the conclusions from the results are speculative.

      a. Why not include other measures such as genetic diversity, which  relates to allele frequency analysis at the population level (using, for example, nucleotide diversity)? IBD and the proportion of highly  related pairs are not a measure of genetic diversity. Please revise the  manuscript and figures accordingly.

      We agree with the fact that IBD is not a direct measure of genetic diversity, even though both are related (Camponovo et al., 2023). More precisely, IBD is a measure of the level of inbreeding in the population (Taylor et al., 2019). We have updated our manuscript by replacing “genetic diversity” with “genetic relatedness” or “inbreeding/outcrossing” when appropriate. Nucleotide diversity would be relevant if we wanted to compare different settings, e.g. Africa vs Asia, however this is not the case here.

      b. Additionally, define what you mean by "recombinatorial genetic  diversity" and explain how it relates to IBD and individual-level  relatedness.

      We considered the term ‘recombinatorial genetic diversity’ to be equivalent to the level of inbreeding in the population. Because this expression is rather uncommon, we decided to drop it from our manuscript and replace it with “inbreeding/outcrossing”.

      c. Recombination is one potential factor contributing to the loss of  relatedness over time. There are several other factors that could  contribute, such as mobility/gene flow, or study-specific limitations  such as low numbers of samples in the low transmission season and many  months apart from the high transmission samples.

      Indeed, the loss of relatedness could be attributed not only to the recombination of local cases but also to new parasites introduced by imported malaria cases. As we stated in our manuscript, previous studies have shown a limited effect of imported cases on maintaining transmission (Lines 72-74). Nevertheless, we cannot definitely exclude that imported cases have an effect on inbreeding levels, since we do not have access to genetic data of surrounding parasites at the time of the study. We updated the discussion accordingly (Lines 497-501).

      d. By including other measures such as linkage disequilibrium you could  further support the statements related to recombination driving the loss of relatedness.

      This commendable suggestion is actually part of an ongoing project focusing on the sharing of IBD fragments and how it correlates with linkage disequilibrium. However, we believe that this analysis would not fit in the scope of our manuscript which is really about spatio-temporal effects on parasite relatedness at a local scale.

      (9) While the authors conclude there is no seasonal pattern in the  drug-resistant markers, one can observe a big fluctuation in the dhps  haplotypes, which go down from 75% to 20% and then up and down again  later. The authors should investigate this in more detail, as dhps is  related to SP resistance, which could be important for seasonal malaria  chemoprofylaxis, especially since the mutations in dhfr seem near-fixed  in the population, indicating high levels of SP resistance at some of  the time points.

      As the reviewer noted, the DHPS A437G haplotype appears to decrease in prevalence twice throughout our study: from the 2015 and 2016 high transmission seasons to the subsequent 2016 and 2017 low transmission seasons. Seasonal Malaria Chemoprophylaxis (SMC) was carried out in the area through the delivery of sulfadoxine–pyrimethamine plus amodiaquine to children 5 years old and younger during high transmission seasons. As DHPS A437G haplotype has been associated with resistance to sulfadoxine, its apparent increase in prevalence during high transmission seasons could be resulting from the selective pressure imposed on parasites. After SMC, the decrease in prevalence observed during low transmission seasons could be caused by a fitness cost of the mutation favouring wild-type parasites over resistant ones. We updated our manuscript to reflect this relevant observation (Lines 400-405).

      (10) I recommend that raw data from genotyping and WGS should be deposited in a public repository.

      Genotyping data is available in the supplementary table 4 (Table S4). Whole genome sequencing is accessible in a European Nucleotide Archive public repository with the identifiers provided in supplementary table 5 (Table S5). We added references to these tables in the manuscript (Lines 249-250).

      Reviewer #2 (Public review):

      Summary:

      Malaria transmission in the Gambia is highly seasonal, whereby periods  of intense transmission at the beginning of the rainy season are  interspersed by long periods of low to no transmission. This raises  several questions about how this transmission pattern impacts the  spatiotemporal distribution of circulating parasite strains. Knowledge  of these dynamics may allow the identification of key units for targeted control strategies, the evaluation of the effect of selection/drift on  parasite phenotypes (e.g., the emergence or loss of drug resistance  genotypes), and analyze, through the parasites' genetic nature, the  duration of chronic infections persisting during the dry season. Using a combination of barcodes and whole genome analysis, the authors try to  answer these questions by making clever use of the different  recombination rates, as measured through the proportion of genomes with  identity-by-descent (IBD), to investigate the spatiotemporal relatedness of parasite strains at different spatial (i.e., individual, household,  village, and region) and temporal (i.e., high, low, and the  corresponding the transitions) levels. The authors show that a large  fraction of infections are polygenomic and stable over time, resulting  in high recombinational diversity (Figure 2). Since the number of  recombination events is expected to increase with time or with the  number of mosquito bites, IBD allows them to investigate the  connectivity between spatial levels and to measure the fraction of  effective recombinational events over time. The authors demonstrate the  epidemiological connectivity between villages by showing the presence of related genotypes, a higher probability of finding similar genotypes  within the same household, and how parasite-relatedness gradually  disappears over time (Figure 3). Moreover, they show that transmission  intensity increases during the transition from dry to wet seasons  (Figure 4). If there is no drug selection during the dry season and if  resistance incurs a fitness cost it is possible that alleles associated  with drug resistance may change in frequency. The authors looked at the  frequencies of six drug-resistance haplotypes (aat1, crt, dhfr, dhps,  kelch13, and mdr1), and found no evidence of changes in allele  frequencies associated with seasonality. They also find chronic  infections lasting from one month to one and a half years with no  dependence on age or gender.

      The use of genomic information and IBD analytic tools provides the  Control Program with important metrics for malaria control policies, for example, identifying target populations for malaria control and  evaluation of malaria control programs.

      Strength:

      The authors use a combination of high-quality barcodes (425 barcodes  representing 101 bi-allelic SNPs) and 199 high-quality genome sequences  to infer the fraction of the genome with shared Identity by Descent  (IBD) (i.e. a metric of recombination rate) over several time points  covering two years. The barcode and whole genome sequence combination  allows full use of a large dataset, and to confidently infer the  relatedness of parasite isolates at various spatiotemporal scales.

      Reviewer #3 (Public review):

      Summary

      This study aimed to investigate the impact of seasonality on the malaria parasite population genetic. To achieve this, the researchers conducted a longitudinal study in a region characterized by seasonal malaria  transmission. Over a 2.5-year period, blood samples were collected from  1,516 participants residing in four villages in the Upper River Region  of The Gambia and tested the samples for malaria parasite positivity.  The parasites from the positive samples were genotyped using a genetic  barcode and/or whole genome sequencing, followed by a genetic  relatedness analysis.

      The study identified three key findings:

      (1) The parasite population continuously recombines, with no single genotype dominating, in contrast to viral populations;

      (2) The relatedness of parasites is influenced by both spatial and temporal distances; and

      (3) The lowest genetic relatedness among parasites occurs during the  transition from low to high transmission seasons. The authors suggest  that this latter finding reflects the increased recombination associated with sexual reproduction in mosquitoes.

      The results section is well-structured, and the figures are clear and  self-explanatory. The methods are adequately described, providing a  solid foundation for the findings. While there are no unexpected  results, it is reassuring to see the anticipated outcomes supported by  actual data. The conclusions are generally well-supported; however, the  discussion on the burden of asymptomatic infections falls outside the  scope of the data, as no specific analysis was conducted on this aspect  and was not stated as part of the aims of the study. Nonetheless, the  recommendation to target asymptomatic infections is logical and  relevant.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The manuscript would benefit from additional detail in the methods section.

      a. Refer to Figure 1 when you describe the included studies and sample processing.

      We added the reference to Figure 1 (Line 131).

      b. While you describe each step in the pipeline, you do not specify the  tools, packages, or environment used (the GitHub link is also  non-functional). A graphic representation of the pipeline, with more  bioinformatic details than Supplementary Figure S1, would be helpful.  Add references to used tools and software created by others.

      The GitHub link has been updated and is now functional. We find Figure S1 already heavy in details, adding in more would be detrimental to our will of it being an easily readable summary of our pipeline. Readers seeking in-depth explanation of our pipeline might be more interested in reading the methods section instead. We are very much committed to credit the authors of the tools that were essential for us to create our analysis pipeline. The two most relevant tools that we used are hmmIBD and the Fws calculation, which were both cited in the methods (Lines 148-152, 214-215).

      c. What changed in the genotyping protocol after May 2016? Does it not  lead to bias in the (temporal) analysis by leaving these loci in for  samples collected before May 2016 and making them 'unknown' for the  majority of samples collected after this date?

      These 21 SNPs all clustered in 1 of the 4 multiplexes used for molecular genotyping, which likely failed to produce accurate base calls. We updated the text to include this information (Lines 198-200).

      The rationale behind the discarding of these 21 SNPs for barcodes sampled after May 2016 was that they were consistently mismatching with the WGS SNPs, probably due to genotyping error as mentioned above. However, by replacing these unknown positions in the molecular barcodes with WGS SNPs, 141 samples did recover some of these 21 SNPs with the accurate base calls (Figure S3A). Additionally, we added an extra analysis to assess the agreement between barcodes and WGS data (Figure S3B).

      d. Related to this, how are unknown and mixed genotypes treated in the  binary matrix? How is the binary matrix coded? Is 0 the same as the  reference allele? So all the missing and mixed are treated as  references? How many missing and mixed alleles are there, how often does it occur and how does this impact the IBD analysis?

      We acknowledge that the details that we provided regarding the IBD analysis were confusing. hmmIBD requires a matrix that contains positive or null integers for each different allele at a given loci (all our loci were bi-allelic, thus only 0 and 1 were used) and -1 for missing data. In our case, we set missing and mixed alleles to -1, which were then ignored during the IBD estimation. The corresponding text was updated accordingly (Lines 173-175).

      e. By excluding households with less than 5 comparisons, are you not preselecting households with high numbers of cases, and therefore higher likelihood of transmission within the household?

      All participants in each household were sampled at every collection time point. This sampling was unbiased towards likelihood of transmission. Excluding pairs of households with less than 5 comparisons was necessary to ensure statistical robustness in our analyses. Besides, this does not necessarily restrict the analysis to only households with a high number of cases as it is the total number of pairs between households that must equal 5 at least (for instance these pairs would pass the cutoff: household with 1 case vs household with 5 cases; household with 2 cases vs household with 3 cases).

      (2) Since the authors only do the genotype replacement and build  consensus barcode for 199 samples, there is a bias between the samples  with consensus barcode and those with only the genotyping barcode. How  did this impact the analysis?

      See (6) in the Public Review.

      a. It would be good to get a better sense of the distribution of the nr  of SNPs in the barcode. The range is 30-89, and 30 SNPs for IBD is  really not that much.

      Adding the range of the number of available SNPs per barcode is indeed particularly relevant. We added a supplementary figure (Figure S5) showing the distribution of homozygous SNPs per barcode, showing that a very small minority of barcodes have only 30 SNPs available for IBD (average of 65, median of 64).

      b. Did you compare the nr of SNPs in the consensus vs. only genotyped  barcodes? Is there more missing data in the genotype-only barcodes?

      We added a supplementary figure (Figure S5) with the distribution of homozygous SNPs in consensus (216 samples) and molecular (209 samples) barcodes. Consensus barcodes have more homozygous SNPs (average 76, median 82) than molecular barcodes (average of 54, median of 53), showing the improvement resulting from using whole genome sequencing data.

      c. How was the cut-off/sample exclusion criteria of 30 SNPs in the barcode determined?

      As described above (Public review section 7.a.), we removed pairs of barcodes with less than 30 comparable loci (and 10 informative loci) because this led to a good agreement between IBD values obtained from barcodes and genomes while still retaining a majority of pairwise IBD values.

      d. Was there more/less IBD between sample pairs with a consensus barcode vs those with genotype-only barcodes?

      We separated pairwise IBD values into two groups: “within consensus” and “within molecular”. The percentages of related barcodes (IBD ≥ 0.5) was virtually identical between “within consensus” (1.88 %) and “within molecular” (1.71 %) groups (χ<sup>2</sup> = 1.33, p value > 0.24).

      (3) Line 124 adds a reference for the PCR method used.

      We have updated this information: varATS qPCR (Line 121).

      (4) Line 126, what is MN2100ff? Is this the catalogue number of the  cellulose columns? Please clarify and add manufacturer details.

      MN2100ff was a replacement for CF11. We added a link to the MalariaGen website describing the product and the procedure (Lines 124-125).

      (5) Line 143: Figure S7 is the first supplementary figure referenced. Change the order and make this Figure S1?

      The numbering of figures is now fixed.

      (6) Line 154: How many SNPs were in the vcf before filtering?

      There were 1,042,186 SNPs before filtering. This information was added to the methods (Line 168).

      (7) Line 156: Why is QUAL filtered at 10000? This seems extremely high.  (I could be mistaken, but often QUAL above 50 or so is already fine, why discard everything below 10000?). What is the range of QUAL scores in  your vcf?

      We used the QUAL > 10000 to make our analyses less computationally intensive while keeping enough relevant genetic information. We agree that keeping variants with extremely high values of QUAL is not relevant above a certain threshold as it translates into infinitesimally low probabilities (10<sup>-(QUAL/10)</sup>) of the variant calling being wrong. We then decided to use a minimal population minor allele frequency (MAF) of 0.01 to keep a variant as this will make the IBD calculation more accurate (Taylor et al., 2019). The variant filtering was carried out with the MAF > 0.01 filter, resulting in 27,577 filtered SNPs with a minimal QUAL of 132. With a cutoff of 3000 available SNPs, we retrieved all 199 genomes previously obtained with the QUAL > 10000 condition. The methods have been updated accordingly (Lines 166-170).

      (8) Line 161-165: How did you handle the mixed alleles in the hmmIBD  analysis for the WGS data? Did you set them as 0 as you do later on for  the consensus barcode?

      Mixed alleles and missing data were ignored. This translated into a value of -1 for the hmmIBD matrix and not 0 as we incorrectly stated previously. We updated our manuscript with this correct information (Lines 173-175).

      (9) Line 168-171: How many SNPs do you have in the WGS dataset after all the filtering steps? If the aim of the IBD with WGS was to validate the IBD-analysis with the barcode, wouldn't it make sense to have at least  200 loci (as shown in Taylor et al to be required for hmmIBD) in the WGS data? What proportion of comparisons were there with only 100 pairs of  loci? This seems like really few SNPs from WGS data.

      There were 27,577 SNPs overall in the 199 high quality genomes. In our analysis, we make the distinction between comparable and informative loci. For two loci to be comparable, they both have to be homozygous. To be informative, they must be comparable and at least one of them must correspond to the minor allele in the population. We borrowed this term and definition from hmmIBD software which yields directly the number of informative loci per pair. By keeping pairs with at least 100 informative SNPs, we aimed to reduce the number of samples artificially related because only population major alleles are being compared. Pairs of genomes had between 1073 and 27466 of these, way above the recommended 200 loci in Taylor et al. (2019). We added more details on comparable and informative sites (Lines 152-160).

      (10) Line 178: why remove the 12 loci that are absent from the WGS? Are  these loci also poorly genotyped in the spotmalaria panel?

      As our goal is to validate the reliability of molecular genotyped SNPs, these 12 loci have to be removed. Especially because we did find a consistent discrepancy between genotyped and WGSed SNPs, which cannot be tested if these SNPs are absent from the genomes.

      (11) Line 180-182: What do you mean by this sentence: "Genomic barcodes  are built using different cutoffs of within-sample MAF and aligned  against molecular barcodes from the same isolates." Is this the analysis presented in the supplementary figure and resulting in the cut-off of  MAF 0.2? Please clarify.

      A loci where both alleles are called can result from two distinct haploïd genomes present or from an error occurring during sequencing data acquisition or processing. To distinguish between the two, we empirically determined the cutoff of within-sample MAF above which the loci can be considered heterozygous and below which only the major allele is kept. The corresponding figure was indeed Figure S2 (referenced in next sentence Lines 192-195). We clarified our approach in the methods (Lines 190-192) and legends of Figures S2 and Figure S3.

      (12) Line 191: How often was there a mismatch between WGS and SNP barcode?

      We added a panel (Figure S3B) showing the average agreement of each SNP between molecular genotyping and WGS. We highlighted the 21 discrepant SNPs showing a lower agreement only for samples collected after May 2016.

      (13) Line 201-204: This part is unclear (as above for the WGS): did you  include sample pairs with more than 10 paired loci? But isn't 10 loci  way too few to do IBD analysis?

      We included pairs of samples with at least 30 comparable loci and 10 informative paired loci (refer to our answer to comment 8 for the difference between the two). We added more details regarding comparable and informative sites (Lines 152-160). Indeed, using fewer than 200 loci leads to an IBD estimation that is on average off by 0.1 or more (Taylor et al., 2019). However we showed that the barcode relatedness classification based on a cutoff of IBD (related when above 0.5, unrelated otherwise) was close enough to our gold standard using genomes (each pair having more than 1000 comparable sites). Because we use this classification approach rather than the exact value of barcode-estimated IBD in our study, our 30 minimum comparable sites cutoff seems sufficient.

      (14) Lines 206-207: which program did you use to analyse Fws?

      We did not use any program, we computed Fws according to Manske et al. (2012) methods.

      (15) Line 233: "we attempted parasite genotyping and whole genome  sequencing of 522 isolates over 16 time points" => This is confusing, you did not do WGS of 522 samples, only 199 as mentioned in the next  sentence.

      We attempted whole genome sequencing on 331 isolates and molecular genotyping on 442 isolates with 251 isolates common between the two methods. We updated our text to clarify this point (Lines 247-252).

      (16) Lines 256-259: Add a range of proportions or some other summary  statistic in this section as you are only referring here to  supplementary figures to support these statements.

      The text has been updated (Lines 271-274).

      (17) Line 260: check the formatting of the reference "Collins22" as the rest of the document references are numbered.

      Fixed.

      (18) Figure 2/3:

      a. You could also inspect relatedness at the temporal level, by  adjusting the network figure where the color is village and shape is  time (month/year).

      Although visualising the effect of time on the parasite relatedness network would be a valuable addition, we did not find any intuitive and simple way of doing so. Using shapes to represent time might end up being more confusing than helpful, especially because the sampling was not done at fixed intervals.

      b. To further support the statement of clustering at the household  level, it might be useful to add a (supplementary) figure with the  network with household number/IDs as color or shape. In the network,  there seems to be a lot of relatedness within the villages and between  villages. Perhaps looking only at the distribution of the proportion of  highly related isolates is simplifying the data too much. Besides, there is no statistical difference between clustering at the household vs  within-village levels as indicated in Figure 3.

      Unfortunately, there are too many households (71 in Figure 2) to make a figure with one color or shape per household readable. The statistical test of the difference between the within household and within village relatedness yielded a p value above the cutoff of 0.05 (p value of 0.084). However, it is possible that the lack of significance arises from the relatively low number of data points available in the “within household” group. This is even more plausible considering the statistical difference of both “within household” and “within village” groups with “between village” group. Overall, our results indicate a decreasing parasite relatedness with spatial distance, and that more investigation would be needed to quantify the difference between “within household” and “within village” groups. 

      (19) Figure 4: Please add more description in the caption of this figure to help interpret what is displayed here. Figure 4A is hard to  interpret and does not seem to show more than is already shown in Figure 3A. What do the dots represent in Figure 4B? It is not clear what is  presented here.

      Compared to Figure 3A, Figure 4A enables the visualization of the relatedness between each individual pair of time points, which are later used in the comparison of relatedness between seasonal groups in Figure 4B. For this reason, we believe that Figure 4A should remain in the manuscript. However, we agree that the relationship between Figure 4A and Figure 4B is not intuitive in the way we presented it initially. For this reason, we added more details in the legend and modified Figure 4A to highlight the seasonal groups used in Figure 4B. 

      (20) Line 360-361: what did you do when haplotypes were not identical?

      We explained it in the methods section (Lines 144-146): in this case, only WGS haplotypes were kept.

      (21) Section chronic infections: it is important to mention that the  majority of chronic infections are individuals from the monthly  dry-season cohort.

      We added a statement about the 21 chronically infected individuals that were also part of the December 2016 – May 2017 monthly follow-up (Lines 423-426).

      (22) Lines 381-386: Did you investigate COI in these individuals? Could  it be co-circulating strains that you do not pick up at all times due to the consensus barcodes and discarding of mixed genotypes (and does not  necessarily show intra-host competition. That is speculation and should  perhaps not be in the results)?

      This is exactly what we think is happening. Due to the very nature of genotyping, only one strain may be observed at a time in the case of a co-infection, where distinct but related strains are simultaneously present in the host. The picked-up strain is typically the one with the highest relative abundance at the time of sampling. As the reviewer stated, fluctuation of strain abundance might not only be due to intra-host competition but also asynchronous development stages of the two strains. We added this observation to the manuscript (Lines 432-435).

      (22) Figure 6: highlight the samples where the barcode was not available in a different color to be able to see the difference between a  non-matching barcode and missing data.

      We thank the reviewer for this great suggestion. We have now added to Figure 6 barcodes available along with their level of relatedness with the dominant genotypes for each continuous infections.

      (24) Improve the discussion by adding a clear summary of the main  findings and their implications, as well as study-specific limitations.

      The Discussion has been updated with a paragraph summarizing the primary results (Lines 451-457).

      (25) Line 445: "implying that the whole population had been replaced in just one year "

      a. What do you mean by replaced? Did other populations replace the  existing populations? I am not sure the lack of IBD is enough to show  that the population changed/was replaced. Perhaps it is more accurate to say that the same population evolved. Nevertheless, other measures such as genetic diversity and genetic differentiation or population  structure.would be more suitable to strengthen these conclusions.

      We agree that “replaced” was the wrong term in this case. We rather intended to describe how the numerous recombinations between malaria parasites completely reshaped the same initial population which gradually displayed lower levels of relatedness over time. We updated the manuscript accordingly (Lines 507-512).

      Reviewer #2 (Recommendations for the authors):

      (1) Line 260: Remove Collins 22.

      Fixed.

      (2) Lines 270-274: 73 + 213 = 286 not 284; sum of percentages is equal to 101%.

      The numbers are correct: the 73 barcodes identical (IBD >= 0.9) to another barcode are a subset of the 213 related (IBD >= 0.5) to another barcode. However we agree that this might be confusing and will considering barcodes to be related if they have an IBD between 0.5 and 0.9, while excluding those with an IBD >= 0.9. The text has been updated (Lines 299-301).

      (3) Section: "Independence of seasonality and drug resistance markers prevalence".

      The text has been revised and the supplementary figure is now a main figure.

      (4) For readers unaware of malaria control policy in the Gambia it would be helpful to have more details on the specifics of anti-malarial drug  administration.

      We added the drugs used in SMC (sulfadoxine-pyrimethamine and amodiaquine) and the first line antimalarial treatment in use in The Gambia during our study (Coartem) (Lines 383-388).

      Reviewer #3 (Recommendations for the authors):

      (1) The abstract is not as clear as the authors' summary. For example, I found the sentence starting with "with 425 P. falciparum..." hard to  follow.

      The abstract has been updated.

      (2) It is better to consistently use "barcode genotyping "or "genotyping by barcode". Sometimes "molecular genotyping" is used instead of  "barcode genotyping"

      We have now replaced all occurrences of “barcode genotyping” with “molecular genotyping” or “molecular barcode genotyping”. We prefer to stick with “molecular genotyping” as this let us distinguish between the molecular and the genomic barcode.

      (3) The introduction is quite disjoined and does not provide a clear  build-up to the gap in knowledge that the study is attempting to fill.  please revise.

      Introduction is now thoroughly revised.

      (4) Line 31 "with notable increase of parasite differentiation" is an interpretation and not an observation.

      We have modified that sentence (Lines 31-33).

      (5) Overall, the introduction requires substantial revision.

      Introduction is now thoroughly revised.

      (6) Line 70 "parasite population adapts..." I thought this required phenotypic analysis and not genetics?

      The idea is that population of parasites may adapt to environmental conditions (such as seasonality) by selecting the most fitted genotypes. For instance, antimalarial exposure has an effect of selecting parasites with specific mutations in drug resistance related genes, and this even appears to be transient (for example with chloroquine). As such, there is good reason to think that seasonality might have a similar effect on parasite genetics.

      (7) Line 129-130: the #442 is not reflected in the schematic Figure 1.

      This is an intentional choice to make the figure more synthetic. For this reason, we included the Figure S1, which provides more details on the data collection and analysis pipeline.

      (8) Line 242-243: "Made with natural earth". What is this?

      This is a statement acknowledging the use of Natural Earth data to produce the map presented in Figure 1A.

      (9) Line 260: "collins22", is this a reference?

      Fixed.

      (10) Line 269-70. Very hard to follow. Please revise.

      We changed the text (Lines 293-297).

      (11) Line 324: similarly... I think there is a typo here.

      We did not find any typo in this specific sentence. However, “Similarly to Figure 3” sounds maybe a bit off, so we changed it to “As in Figure 3” (Line 351).

      (12) Line 332-334: very hard to follow. please revise. Again, the lower  parasite relatedness during the transition from low to high was linked  to recombination occurring in the mosquito but what about infection  burden shifting to naive young children? Is there a role for host  immunity in the observed reduction in parasite-relatedness during the  transition period?

      This text has been rewritten (Lines 356-361).

      About the hypothesis of infection burden shifting to naïve young children, this question is difficult to address in The Gambia because children under 5 years old received Seasonal Malaria Chemoprophylaxis during the high transmission season. In older children (6-15 years old), the prevalence was similar to adults (Fogang et al., 2024).

      About the role of host immunity on parasite relatedness across time and space, our dataset is too small to divide it in different age groups. Further studies should address this very interesting question.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper examines changes in relaxation time (T1 and T2) and magnetization transfer parameters that occur in a model system and in vivo when cells or tissue are depolarized using an equimolar extracellular solution with different concentrations of the depolarizing ion K<sup>+</sup>. The motivation is to explain T2 changes that have previously been observed by the authors in an in vivo model with neural stimulation (DIANA) and to try to provide a mechanism to explain those changes.

      Strengths:

      The authors argue that the use of various concentrations of KCL in the extracellular fluid depolarize or hyperpolarize the cell pellets used and that this change in membrane potential is the driving force for the T2 (and T1-supplementary material) changes observed. In particular, they report an increase in T2 with increasing KCL concentration in the extracellular fluid (ECF) of pellets of SH-SY5Y cells. To offset the increasing osmolarity of the ECF due to the increase in KCL, the NaCL molarity of the ECF is proportionally reduced. The authors measure the intracellular voltage using patch clamp recordings, which is a gold standard. With 80 mM of KCL in the ECF, a change in T2 of the cell pellets of ~10 ms is observed with the intracellular potential recorded as about -6 mv. A very large T1 increase of ~90 ms is reported under the same conditions. The PSR (ratio of hydrogen protons on macromolecules to free water) decreases by about 10% at this 80 mM KCL concentration. Similar results are seen in a Jurkat cell line and similar, but far smaller changes are observed in vivo, for a variety of reasons discussed. As a final control, T1 and T2 values are measured in the various equimolar KCL solutions. As expected, no significant changes in T1 and T2 of the ECF were observed for these concentrations.

      Weaknesses:

      [Reviewer 1, Comment 1] While the concepts presented are interesting, and the actual experimental methods seem to be nicely executed, the conclusions are not supported by the data for a number of reasons. This is not to say that the data isn't consistent with the conclusions, but there are other controls not included that would be necessary to draw the conclusion that it is membrane potential that is driving these T1 and T2 changes. Unfortunately for these authors, similar experiments conducted in 2008 (Stroman et al. Magn. Reson. in Med. 59:700-706) found similar results (increased T2 with KCL) but with a different mechanism, that they provide definite proof for. This study was not referenced in the current work.

      It is well established that cells swell/shrink upon depolarization/hyperpolarization. Cell swelling is accompanied by increased light transmittance in vivo, and this should be true in the pellet system as well. In a beautiful series of experiments, Stroman et al. (2008) showed in perfused brain slices that the cells swell upon equimolar KCL depolarization and the light transmittance increases. The time course of these changes is quite slow, of the order of many minutes, both for the T2-weighted MRI signal and for the light transmittance. Stroman et al. also show that hypoosmotic changes produce the exact same time course as the KCL depolarization changes (and vice versa for the hyperosmotic changes - which cause cell shrinkage). Their conclusion, therefore, was that cell swelling (not membrane potential) was the cause of the T2-weighted changes observed, and that these were relatively slow (on the scale of many minutes).

      What are the implications for the current study? Well, for one, the authors cannot exclude cell swelling as the mechanism for T2 changes, as they have not measured that. It is however well established that cell swelling occurs during depolarization, so this is not in question. Water in the pelletized cells is in slow/intermediate exchange with the ECF, and the solutions for the two compartment relaxation model for this are well established (see Menon and Allen, Magn. Reson. in Med. 20:214-227 (1991). The T2 relaxation times should be multiexponential (see point (3) further below). The current work cannot exclude cell swelling as the mechanism for T2 changes (it is mentioned in the paper, but not dealt with). Water entering cells dilutes the protein structures, changes rotational correlation times of the proteins in the cell and is known to increase T2. The PSR confirms that this is indeed happening, so the data in this work is completely consistent with the Stroman work and completely consistent with cell swelling associated with depolarization. The authors should have performed light scattering studies to demonstrate the presence or absence of cell swelling. Measuring intracellular potential is not enough to clarify the mechanism.

      [Reviewer 1, Response 1] We appreciate the reviewer’s comments. We agree that changes in cell volume due to depolarization and hyperpolarization significantly contribute to the observed changes in T2, PSR, and T1, especially in pelletized cells. For this reason, we already noted in the Discussion section of the original manuscript that cell volume changes influence the observed MR parameter changes, though this study did not present the magnitude of the cell volume changes. In this regard, we thank the reviewer for introducing the work by Stroman et al. (Magn Reson Med 59:700-706, 2008). When discussing the contribution of the cell volume changes to the observed MR parameter changes, we additionally discussed the work of Stroman et al. in the revised manuscript.

      In addition, we acknowledge that the title and main conclusion of the original manuscript may be misleading, as we did not separately consider the effect of cell volume changes on MR parameters. To more accurately reflect the scope and results of this study and also take into account the reviewer 2’s suggestion, we adjusted the title to “Responses to membrane potential-modulating ionic solutions measured by magnetic resonance imaging of cultured cells and in vivo rat cortex” and also revised the relevant phrases in the main text.

      Finally, when [K<sup>+</sup>]-induced membrane potential changes are involved, there seems to be factors other than cell volume changes that appear to influence T<sup>2</sup> changes. Our follow-up study shows that there are differences in volume changes for the same T<sup>2</sup> change in the following two different situations: pure osmotic volume changes versus [K<sup>+</sup>]-induced volume changes. For example, for the same T<sup>2</sup> change, the volume change for depolarization is greater than the volume change for hypoosmotic conditions. We will present these results in this coming ISMRM 2025 and are also preparing a manuscript to report shortly.

      [Reviewer 1, Comment 2] So why does it matter whether the mechanism is cell swelling or membrane potential? The reason is response time. Cell swelling due to depolarization is a slow process, slower than hemodynamic responses that characterize BOLD. In fact, cell swelling under normal homeostatic conditions in vivo is virtually non-existent. Only sustained depolarization events typically associated with non-naturalistic stimuli or brain dysfunction produce cell swelling. Membrane potential changes associated with neural activity, on the other hand, are very fast. In this manuscript, the authors have convincingly shown a signal change that is virtually the same as what was seen in the Stroman publication, but they have not shown that there is a response that can be detected with anything approaching the timescale of an action potential. So one cannot definitely say that the changes observed are due to membrane potential. One can only say they are consistent with cell swelling, regardless of what causes the cell swelling.

      For this mechanism to be relevant to explaining DIANA, one needs to show that the cell swelling changes occur within a millisecond, which has never been reported. If one knows the populations of ECF and pellet, the T2s of the ECF and pellet and the volume change of the cells in the pellet, one can model any expected T2 changes due to neuronal activity. I think one would find that these are minuscule within the context of an action potential, or even bulk action potential.

      [Reviewer 1, Response 2] In the context of cell swelling occurring at rapid response times, if we define cell swelling simply as an “increase in cell volume,” there are several studies reporting transient structural (or volumetric) changes (e.g., ~nm diameter change over ~ms duration) in neuron cells during action potential propagation (Akkin et al., Biophys J 93:1347-1353, 2007; Kim et al., Biophys J 92:3122-3129, 2007; Lee et al., IEEE Trans Biomed Eng 58:3000-3003, 2011; Wnek et al., J Polym Sci Part B: Polym Phys 54:7-14, 2015; Yang et al., ACS Nano 12:4186-4193, 2018). These studies show a good correlation between membrane potential changes and cell volume changes (even if very small) at the cellular level within milliseconds.

      As mentioned in the Response 1 above, this study does not address rapid dynamic membrane potential changes on the millisecond scale, which we explicitly mentioned as one of the limitations in the Discussion section of the original manuscript. For this reason, we do not claim in this study that we provide the reader with definitive answers about the mechanisms involved in DIANA. Rather, as a first step toward addressing the mechanism of DIANA, this study confirms that there is a good correlation between changes in membrane potential and measurable MR parameters (e.g., T<sup>2</sup> and PSR) when using ionic solutions that modulate membrane potential. Identifying MR parameter changes that occur during millisecond-scale membrane potential changes due to rapid neural activation will be addressed in the follow-up study mentioned in the Response 1 above.

      There are a few smaller issues that should be addressed.

      [Reviewer 1, Comment 3] (1) Why were complicated imaging sequences used to measure T1 and T2? On a Bruker system it should be possible to do very simple acquisitions with hard pulses (which will not need dictionaries and such to get quantitative numbers). Of course, this can only be done sample by sample and would take longer, but it avoids a lot of complication to correct the RF pulses used for imaging, which leads me to the 2nd point.

      [Reviewer 1, Response 3] We appreciate the reviewer’s suggestion regarding imaging sequences. In fact, we used dictionaries for fitting in vivo T<sup>2</sup> decay data, not in vitro data. Sample-by-sample nonlocalized acquisition with hard pulses may be applicable for in vitro measurements. However, for in vivo measurements, a slice-selective multi-echo spin-echo sequence was necessary to acquire T<sup>2</sup> maps within a reasonable scan time. Our choice of imaging sequence was guided by the need to spatially resolve MR signals from specific regions of interest while balancing scan time constraints.

      [Reviewer 1, Comment 4] (2) Figure S1 (H) is unlike any exponential T2 decay I have seen in almost 40 years of making T2 measurements. The strange plateau at the beginning and the bump around TE = 25 ms are odd. These could just be noise, but the fitted curve exactly reproduces these features. A monoexponential T2 decay cannot, by definition, produce a fit shaped like this.

      [Reviewer 1, Response 4] The T<sup>2</sup> decay curves in Figure S1(H) indeed display features that deviate from a simple monoexponential decay. In our in vivo experiments, we used a multi-echo spin-echo sequence with slice-selective excitation and refocusing pulses. In such sequences, the echo train is influenced by stimulated echoes and imperfect slice profiles. This phenomenon is inherent to the pulse sequence rather than being artifacts or fitting errors (Hennig, Concepts Magn Reson 3:125-143, 1991; Lebel and Wilman, Magn Reson Med 64:1005-1014, 2010; McPhee and Wilman, Magn Reson Med 77:2057-2065, 2017). Therefore, we fitted the T<sub>2</sub> decay curve using the technique developed by McPhee and Wilman (2017).

      [Reviewer 1, Comment 5] (3) As noted earlier, layered samples produce biexponential T2 decays and monoexponential T1 decays. I don't quite see how this was accounted for in the fitting of the data from the pellet preparations. I realize that these are spatially resolved measurements, but the imaging slice shown seems to be at the boundary of the pellet and the extracellular media and there definitely should be a biexponential water proton decay curve. Only 5 echo times were used, so this is part of the problem, but it does mean that the T2 reported is a population fraction weighted average of the T2 in the two compartments.

      [Reviewer 1, Response 5] We understand the reviewer’s concern regarding potential biexponential decay due to the presence of different compartments. In our experiments, we carefully positioned the imaging slice sufficiently remote from the pellet-media interface. This approach ensures that the signal predominantly arises from the cells (and interstitial fluid), excluding the influence of extracellular media above the cell pellet. We described the imaging slice more clearly in the revised manuscript. As mentioned in our Methods section, for in vitro experiments, we repeated a single-echo spin-echo sequence with 50 difference echo times. While Figure 1C illustrates data from five echo times for visual clarity, the full dataset with all 50 echo times was used for fitting. We clarified this point in the revised manuscript to avoid any misunderstanding.

      [Reviewer 1, Comment 6] (4) Delta T1 and T2 values are presented for the pellets in wells, but no absolute values are presented for either the pellets or the KCL solutions that I could find.

      [Reviewer 1, Response 6] As requested by the reviewer, we included the absolute values in the supplementary information.

      Reviewer #2 (Public review):

      Summary:

      Min et al. attempt to demonstrate that magnetic resonance imaging (MRI) can detect changes in neuronal membrane potentials. They approach this goal by studying how MRI contrast and cellular potentials together respond to treatment of cultured cells with ionic solutions. The authors specifically study two MRI-based measurements: (A) the transverse (T2) relaxation rate, which reflects microscopic magnetic fields caused by solutes and biological structures; and (B) the fraction or "pool size ratio" (PSR) of water molecules estimated to be bound to macromolecules, using an MRI technique called magnetization transfer (MT) imaging. They see that depolarizing K<sup>+</sup> and Ba2+ concentrations lead to T2 increases and PSR decreases that vary approximately linearly with voltage in a neuroblastoma cell line and that change similarly in a second cell type. They also show that depolarizing potassium concentrations evoke reversible T2 increases in rat brains and that these changes are reversed when potassium is renormalized. Min et al. argue that this implies that membrane potential changes cause the MRI effects, providing a potential basis for detecting cellular voltages by noninvasive imaging. If this were true, it would help validate a recent paper published by some of the authors (Toi et al., Science 378:160-8, 2022), in which they claimed to be able to detect millisecond-scale neuronal responses by MRI.

      Strengths:

      The discovery of a mechanism for relating cellular membrane potential to MRI contrast could yield an important means for studying functions of the nervous system. Achieving this has been a longstanding goal in the MRI community, but previous strategies have proven too weak or insufficiently reproducible for neuroscientific or clinical applications. The current paper suggests remarkably that one of the simplest and most widely used MRI contrast mechanisms-T2 weighted imaging-may indicate membrane potentials if measured in the absence of the hemodynamic signals that most functional MRI (fMRI) experiments rely on. The authors make their case using a diverse set of quantitative tests that include controls for ion and cell type-specificity of their in vitro results and reversibility of MRI changes observed in vivo.

      Weaknesses:

      [Reviewer 2, Comment 1] The major weakness of the paper is that it uses correlational data to conclude that there is a causational relationship between membrane potential and MRI contrast. Alternative explanations that could explain the authors' findings are not adequately considered. Most notably, depolarizing ionic solutions can also induce changes in cellular volume and tissue structure that in turn alter MRI contrast properties similarly to the results shown here. For example, a study by Stroman et al. (Magn Reson Med 59:700-6, 2008) reported reversible potassium-dependent T2 increases in neural tissue that correlate closely with light scattering-based indications of cell swelling. Phi Van et al. (Sci Adv 10:eadl2034, 2024) showed that potassium addition to one of the cell lines used here likewise leads to cell size increases and T2 increases. Such effects could in principle account for Min et al.'s results, and indeed it is difficult to see how they would not contribute, but they occur on a time scale far too slow to yield useful indications of membrane potential. The authors' observation that PSR correlates negatively with T2 in their experiments is also consistent with this explanation, given the inverse relationship usually observed (and mechanistically expected) between these two parameters. If the authors could show a tight correspondence between millisecond-scale membrane potential changes and MRI contrast, their argument for a causal connection or a useful correlational relationship between membrane potential and image contrast would be much stronger. As it is, however, the article does not succeed in demonstrating that membrane potential changes can be detected by MRI.

      [Reviewer 2, Response 1] We appreciate the reviewer’s comments. We agree that changes in cell volume due to depolarization and hyperpolarization significantly contribute to the observed MR parameter changes. For this reason, we have already noted in the Discussion section of the original manuscript that cell volume changes influence the observed MR parameter changes. In this regard, we thank the reviewer for introducing the work by Stroman et al. (Magn Reson Med 59:700-706, 2008) and Phi Van et al. (Sci Adv 10:eadl2034, 2024). When discussing the contribution of the cell volume changes to the observed MR parameter changes, we additionally discussed both work of Stroman et al. and Phi Van et al. in the revised manuscript.

      In addition, this study does not address rapid dynamic membrane potential changes on the millisecond scale, which we explicitly discussed as one of the limitations of this study in the Discussion section of the original manuscript. For this reason, we do not claim in this study that we provide the reader with definitive answers about the mechanisms involved in DIANA. Rather, as a first step toward addressing the mechanism of DIANA, this study confirms that there is a good correlation between changes in membrane potential and measurable MR parameters (although on a slow time scale) when using ionic solutions that modulate membrane potential. Identifying MR parameter changes that occur during millisecond-scale membrane potential changes due to rapid neural activation will be addressed in the follow-up study mentioned in the Response 1 to Reviewer 1’s Comment 1 above.

      Together, we acknowledge that the title and main conclusion of the original manuscript may be misleading. To more accurately reflect the scope and results of this study and also consider the reviewer’s suggestion, we adjusted the title to “Responses to membrane potential-modulating ionic solutions measured by magnetic resonance imaging of cultured cells and in vivo rat cortex” and also revised the relevant phrases in the main text.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      [Reviewer 1, Comment 7] The manuscript is well written. One thing to emphasize early on is that the KCL depolarization is done in an equimolar (or isotonic) manner. I was not clear on this point until I got to the very end of the methods. This is a strength of the paper and should be presented earlier.

      [Reviewer 1, Response 7] In response to the reviewer’s suggestion, we have revised the manuscript to present the equimolar characteristic of our experiment earlier.

      [Reviewer 1, Comment 8] In terms of experiments, the relaxation time measurements are not well constructed. They should be done with a CPMG sequence with hundreds of echos and properly curve fit. This is entirely possible on a Bruker spectrometer.

      [Reviewer 1, Response 8] As noted in our Response to Reviewer 1’s Comment 3, while a CPMG sequence with numerous echoes and straightforward curve fitting can be effective, it is less feasible for in vivo experiments. Our multi-echo spin-echo sequence was a balanced approach between spatial resolution, reasonable scan duration, and the need to localize signals within specific regions of interest.

      [Reviewer 1, Comment 9] Measurements of cell swelling should be done to determine the time course of the cell swelling. This could be with NMR (CPMG) or with light scattering. For this mechanism to be relevant to explaining DIANA, one needs to show that the cell swelling changes occur within a millisecond, which has never been reported. If one knows the populations of ECF and pellet, the T2s of the ECF and pellet and the volume change of the cells in the pellet, one can model any expected T2 changes due to neuronal activity.

      [Reviewer 1, Response 9] We acknowledge the importance of further research to further strengthened the claims of this study through additional experiments such as cell volume recording. We will do it in future studies.

      As noted in our Response 2 to Reviewer 1’s Comment 2, this study does not address rapid membrane potential changes on the millisecond scale, and we acknowledge that establishing the precise timing of cell swelling is crucial for fully understanding the mechanisms of DIANA. Our current work demonstrates that MR parameters (e.g., T<sup>2</sup> and PSR) correlate strongly with membrane potential-modulating ionic environments, but it does not extend to millisecond-scale neural activation. We recognize the importance of further experiments, such as direct cell volume measurements and plan to incorporate it in future studies to build on the insights gained from the present work.

      Reviewer #2 (Recommendations for the authors):

      Here are a few comments, questions, and suggestions for improvement:

      [Reviewer 2, Comment 2] I could not find much information about the various incubation times and delays used for the authors' in vitro experiments. For each of the in vitro experiments in particular, how long were cells exposed to the stated ionic condition prior to imaging, and how long did the imaging take? Could this and any other relevant information about the experimental timing please be provided and added to the methods section?

      [Reviewer 2, Response 2] We have included the information about the preparation/incubation times in the revised manuscript. For the scan time, it was already stated in the original manuscript: 23 minutes for the single-echo spin-echo sequence and 23 minutes for the inversion-recovery multi-echo spin-echo, for a total of 46 minutes.

      [Reviewer 2, Comment 3] In what format were the cells used for patch clamping, and were any controls done to ensure that characteristics of these cells were the same as those pelleted and imaged in the MRI studies? How long were the incubation times with ionic solutions in the patch clamp experiment? This information should likewise be added to the paper.

      [Reviewer 2, Response 3] We have clarified in the revised manuscript that SH-SY5Y cells were patch clamp-measured in their adherent state. On the other hand, the cells were dissociated from the culture plate and pelleted, so the experimental environments were not entirely identical. The patch clamp experiments involved a 20–30 minutes incubation period with the ionic solutions. We have included this information in the revised manuscript.

      [Reviewer 2, Comment 4] Can the authors provide information about the mean cell size observed under each condition in their in vitro experiments?

      [Reviewer 2, Response 4] We did not directly quantify the mean cell size for each in vitro condition in this study, so we do not have corresponding data. However, we acknowledge that this information could provide valuable insights into potential mechanisms underlying the observed MR parameter changes. In future experiments, we plan to include direct cell-size measurements to further elucidate how changes in cell volume or hydration contribute to our MR findings.

      [Reviewer 2, Comment 5] The ionic challenges used both in vitro and in vivo could also have affected cell permeability, with corresponding effects that would be detectable in diffusion weighted imaging. Did the authors examine this or obtain any results that could reflect on contributions of permeability properties to the contrast effects they report?

      [Reviewer 2, Response 5] We did not perform diffusion-weighted imaging and therefore do not have direct data regarding changes in cell permeability. We agree that incorporating diffusion-weighted measurements could help distinguish whether the MR parameters changes are driven primarily by membrane potential shifts, cell volume changes, or variations in permeability properties. We will consider these approaches in our future studies.

      [Reviewer 2, Comment 6] Clearly, a faster stimulation method such as optogenetics, in combination with time-locked MRI readouts of the pelleted cells, would be more effective at demonstrating a useful relationship between cellular neurophysiology and MRI contrast in vitro. Can the authors present data from such an experiment? Is there any information they can present that documents the time course of observed responses in their experiments?

      [Reviewer 2, Response 6] In the current study, our methodology did not include time-resolved or dynamic measurements. While it may be possible to obtain indirect information about the temporal dynamics using T<sup>2</sup>-weighted or MT-weighted imaging, such an experiment was beyond the scope of this work. However, we agree that an optogenetic approach with time-locked MRI acquisitions could help directly link cell physiology to MRI contrast, and we will explore this in future studies.

      [Reviewer 2, Comment 7] The authors used a drug cocktail to suppress hemodynamic effects in the experiments of Figs. 5-6. What evidence is there that this cocktail successfully suppresses hemodynamic responses and that it also preserves physiological responses to the ionic challenges used in their experiments? Were analogous in vivo results also obtained in the absence of the cocktail?

      [Reviewer 2, Response 7] We appreciate the reviewer’s concern regarding pharmacological suppression of hemodynamic effects. Although each component is known to inhibit nitric oxide synthesis, we did not directly measure the degree of hemodynamic suppression in this study. In addition, we cannot definitively confirm that these agents preserved the physiological responses to the ionic challenges. We have clarified these points in the revised manuscript and identified them as limitations of the study.

      [Reviewer 2, Comment 8] Why weren't PSR results reported as part of the in vivo experimental results in Fig. 5? Does PSR continue to vary inversely to T2 in these experiments?

      [Reviewer 2, Response 8] In our current experimental setup, acquiring the T<sup>2</sup> map four times required 48 minutes, and extending the scan to include additional quantitative MT measurements for PSR would have significantly prolonged the scanning session. Given that these experiments were conducted on acutely craniotomized rats, maintaining stable physiological conditions for such a long period of time was challenging. Therefore, due to time constraints, we did not perform MT measurements and focused on T<sub>2</sub> mapping.

      [Reviewer 2, Comment 9] The authors have established in vivo optogenetic stimulation paradigms in their laboratory and used them in the Toi et al. DIANA study. Were T2 or PSR changes observed in vivo using standard T2 measurement or T2-weighted imaging methods that do not rely on the DIANA pulse sequence they originally applied?

      [Reviewer 2, Response 9] Our current T<sub>2</sub> mapping experiments utilized a standard multi-echo spin-echo sequence, rather than the DIANA pulse sequence employed in our previous work. In this respect, the T<sub>2</sub> changes we observed in vivo do not rely on the specialized DIANA methodology.

      [Reviewer 2, Comment 10] In the discussion section, the authors state that to their knowledge, theirs "is the first report that changes in membrane potential can be detected through MRI." This cannot be true, as their own Toi et al. Science paper previously claimed this, and a number of the studies cited on p.2 also claimed to detect close correlates of neuroelectric activity. This statement should be amended or revised.

      [Reviewer 2, Response 10] We appreciate the reviewer’s comment. We have revised the discussion section of the manuscript to reflect the points raised by the reviewer.

      [Reviewer 2, Comment 11] Because the current study does not actually demonstrate that changes in membrane potential can be detected by MRI, the authors should alter the title, abstract, and a number of relevant statements throughout the text to avoid implying that this has been shown. The title, for instance, could be changed to "Responses to depolarizing and hyperpolarizing ionic solutions measured by magnetic resonance imaging of excitable cells and rat brains," or something along these lines.

      [Reviewer 2, Response 11] We appreciate the reviewer’s suggestions. We have revised the title, abstract, and relevant statements of the manuscript to clarify that our findings show MR-detectable responses to ionic solutions that are expected to modulate membrane potential, rather than demonstrating direct detection of membrane potential changes by MRI.

      [Reviewer 2, Comment 12] The axes in Fig. 3 seem to be mislabeled. I think the horizontal axes are supposed to be membrane potential measured in mV.

      [Reviewer 2, Response 12] Thank the reviewer for finding an error. We have corrected the axis labels in Figure 3 to indicate membrane potential (in mV) on the horizontal axis.

      [Reviewer 2, Comment 13] Since neither the experiments in Jurkat cells (Fig. 4) nor the in vivo MRI tests (Fig. 5-6) appear to have made in conjunction with membrane potential measurements, it seems like a stretch to refer to these experiments as involving manipulation of membrane potentials per se. Instead, the authors should refer to them as involving administration of stimuli expected to be depolarizing or hyperpolarizing. The "hyperpolarization" and "depolarization" labels of Fig. 4 similarly imply a result that has not actually been shown, and should ideally be changed.

      [Reviewer 2, Response 13] To prevent any misleading that membrane potential changes were directly measured in Jurkat cells or in vivo, we have revised the relevant text and figure labels.

      [Reviewer 2, Comment 14] The changes in T2 and PSR documented with various K<sup>+</sup> challenges to Jurkat cells in Fig. 4 seem to follow a step-function-like profile that differs from the results reported in SH-SY5Y cells. Can the authors explain what might have caused this difference?

      [Reviewer 2, Response 14] We currently do not have a definitive explanation for why Jurkat cells exhibit a step-function-like response to varying K⁺ levels, whereas SH-SY5Y cells show a linear response to log [K<sup>+</sup>]. Experiments that include direct membrane potential measurements in Jurkat cells would help clarify whether this difference arises from genuinely different patterns of depolarization/hyperpolarization or from other factors. We have revised the revised manuscript to address this point.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review): 

      Summary: 

      This fascinating manuscript studies the effect of education on brain structure through a natural experiment. Leveraging the UK BioBank, these authors study the causal effect of education using causal inference methodology that focuses on legislation for an additional mandatory year of education in a regression discontinuity design. 

      Strengths: 

      The methodological novelty and study design were viewed as strong, as was the import of the question under study. The evidence presented is solid. The work will be of broad interest to neuroscientists 

      Weaknesses: 

      There were several areas which might be strengthed from additional consideration from a methodological perspective. 

      We sincerely thank the reviewer for the useful input, in particular, their recommendation to clarify RD and for catching some minor errors in the methods (such as taking the log of the Bayes factors). 

      Reviewer #1 (Recommendations for the authors): 

      (1) The fuzzy local-linear regression discontinuity analysis would benefit from further description. 

      (2) In the description of the model, the terms "smoothness" and "continuity" appear to be used interchangeably. This should be adjusted to conform to mathematical definitions. 

      We have now added to our explanations of continuity regression discontinuity. In particular, we now explain “fuzzy”, and add emphasis on the two separate empirical approaches (continuity and local-randomization), along with fixing our use of “smoothness” and “continuity”.

      results:

      “Compliance with ROSLA was very high (near 100%; Sup. Figure 2). However, given the cultural and historical trends leading to an increase in school attendance before ROSLA, most adolescents were continuing with education past 15 years of age before the policy change (Sup Plot. 7b). Prior work has estimated 25 percent of children would have left school a year earlier if not for ROSLA 41. Using the UK Biobank, we estimate this proportion to be around 10%, as the sample is healthier and of higher SES than the general population (Sup. Figure 2; Sup. Table 2) 46–48.”

      methods:

      “RD designs, like ours, can be ‘fuzzy’ indicating when assignment only increases the probability of receiving it, in turn, treatment assigned and treatment received do not correspond for some units 33,53. For instance, due to cultural and historical trends, there was an increase in school attendance before ROSLA; most adolescents were continuing with education past 15 years of age (Sup Plot. 7b). Prior work has estimated that 25 percent of children would have left school a year earlier if not for ROSLA 41. Using the UK Biobank, we estimate this proportion to be around 10%, as the sample is healthier and of higher SES than the general population (Sup. Figure 2; Sup. Table 2) 46–48.”

      (3) The optimization of the smoother based on MSE would benefit from more explanation and consideration. How was the flexibility of the model taken into account in testing? Were there any concerns about post-selection inference? A sensitivity analysis across bandwidths is also necessary. Based on the model fit in Figure 1, results from a linear model should also be compared. 

      It is common in the RD literature to illustrate plots with higher-order polynomial fits while inference is based on linear (or at most quadratic) models (Cattaneo, Idrobo & Titiunik, 2019). We agree that this field-specific practice can be confusing to readers. Therefore, we have redone Figure 1 using local-linear fits better aligning with our analysis pipeline. Yet, it is still not a one-to-one alignment as point estimation and confidence are handled robustly while our plotting tools are simple linear fits. In addition, we updated Sup. Fig 3 and moved 3rd-order polynomial RD plots to Sup. Fig 4.

      Empirical RD has many branching analytical decisions (bandwidth, polynomial order, kernel) which can have large effects on the outcome. Fortunately, RD methodology is starting to become more standardized (Catteneo & Titiunik, 2022, Ann. Econ Rev) as there have been indications of publication bias using these methods (Stommes, Aronow & Sävje, 2023, Research and Politics (This paper suggest it is not researcher degrees of freedom, rather inappropriate inferential methods)). While not necessarily ill-intended, researcher degrees of freedom and analytic flexibility are major contributors to publication bias. We (self) limited our analytic flexibility by using pre-registration (https://osf.io/rv38z).

      One of the most consequential analytic decisions in RD is the bandwidth size as there is no established practice, they are context-specific and can be highly influential on the results. The choice of bandwidths can be framed as a ‘bias vs. variance trade-off’. As bandwidths increase, variance decreases since more subjects are added yet bias (misspecification error/smoothing bias) also increases (as these subjects are further away and less similar). In our case, our assignment (running/forcing) variable is ‘date of birth in months’; therefore our smallest comparison would be individuals born in August 1957 (unaffected/no treatment) vs September 1957 (affected/treated). This comparison has the least bias (subjects are the most similar to each other), yet it comes at the expense of very few subjects (high variance in our estimate). 

      MSE-derived bandwidths attempt to solve this issue by offering an automatic method to choose an analysis bandwidth in RD. Specifically, this aims to minimize the MSE of the local polynomial RD point estimator – effectively choosing a bandwidth by balancing the ‘bias vs. variance trade-off’ (explained in detail 4.4.2 Cattaneo et al., 2019 p 45 - 51 “A practical introduction to regression discontinuity designs: foundations”). Yet, you are very correct in highlighting potential overfitting issues as they are “by construction invalid for inference” (Calonico, Cattaneo & Farrell, 2020, p. 192). Quoting from Cattaneo and Titiunik’s Annual Review of Economics from 2022: 

      “Ignoring the misspecification bias can lead to substantial overrejection of the null hypothesis of no treatment effect. For example, back-of-the-envelop calculations show that a nominal 95% confidence interval would have an empirical coverage of about 80%.”

      Fortunately, modern RD analysis packages (such as rdrohust or RDHonest) calculate robust confidence intervals - for more details see Armstrong and Kolesar (2020). For a summary on MSE-bandwidths see the section “Why is it hard to estimate RD effects?” in Stommes and colleagues 2023 (https://arxiv.org/abs/2109.14526). For more in-depth handling see the Catteneo, Idrobo, and Titiunik primer (https://arxiv.org/abs/1911.09511).

      Lastly, with MSE-derived bandwidths sensitivity tests only make sense within a narrow window of the MSE-optimized bandwidth (5.5 Cattaneo et al., 2019 p 106 - 107). When a significant effect occurs, placebo cutoffs (artificially moving the cutoff) and donut-hole analysis are great sensitivity tests. Instead of testing our bandwidths, we decided to use an alternate RD framework (local randomization) in which we compare 1-month and 5-month windows. Across all analysis strategies, MRI modalities, and brain regions, we do not find any effects of the education policy change ROSLA on long-term neural outcomes.

      (4) In the Bayesian analysis, the authors deviated from their preregistered analytic plan. This whole section is a bit confusing in its current form - for example, point masses are not wide but rather narrow. Bayes factors are usually estimated; it is unclear how or why a prior was specified. What exactly is being modeled using a prior? Also, throughout - If the log was taken, as the methods seem to indicate for the Bayes factor, this should be mentioned in figures and reported estimates. 

      First, we would like to thank you for spotting that we incorrectly kept the log in the methods. We have fixed this and added the following sentence to the methods: 

      “Bayes factors are reported as BF<sub>10</sub> in support of the alternative hypothesis, we report Bayes factors under 1 as the multiplicative inverse (BF<sub>01</sub> = 1/BF)”

      All Bayesian analyses need to have a prior. In practice, this becomes an issue when you’re uncertain about 1) the location of the effect (directionality & center mass, defined by a location parameter), yet more importantly, the 2) confidence/certainty of the range-spread of possible effects (determined by a scale parameter). In normally distributed priors these two ‘beliefs’ are represented with a mean and a standard deviation (the latter impacts your confidence/certainty on the range of plausible parameter space). 

      Supplementary figure 6 illustrates several distributions (location = 0 for all) with varying scale parameters; when used as Bayesian priors this indicates differing levels of confidence in our certainty of the plausible parameter space. We illustrate our three reported, normally distributed priors centered at zero in blue with their differing scale parameters (sd = .5, 1 & 1.5).

      All of these five prior distributions have the same location parameter (i.e., 0) yet varying differences in the scale parameter – our confidence in the certainty of the plausible parameter space. At first glance it might seem like a flat/uniform prior (not represented) is a good idea – yet, this would put equal weight on the possibility of every estimate thereby giving the same probability mass to implausible values as plausible ones. A uniform prior would, for instance, encode the hypothesis that education causing a 1% increase in brain volume is just as plausible as it causing either a doubling or halving in brain volume. In human research, we roughly know a range of reasonable effect sizes and it is rare to see massive effects.

      A benefit of ‘weakly-informative’ priors is that they limit the range of plausible parameter values. The default prior in STAN (a popular Bayesian estimation program; https://mc-stan.org) is a normally distributed prior with a mean of zero and an SD of 2.5 (seen in orange in the figure; our initial preregistered prior). This large standard deviation easily permits positive and negative estimates putting minimal emphasis on zero. Contrast this to BayesFactor package’s (Morey R, Rouder J, 2023) default “wide” prior which is the Cauchy distribution (0, .7) illustrated in magenta (for more on the Cauchy see: https://distribution-explorer.github.io/continuous/cauchy.html). 

      These different defaults reflect differing Bayesian philosophical schools (‘estimate parameters’ vs ‘quantify evidence’ camps); if your goal is to accurately estimate a parameter it would be odd to have a strong null prior, yet (in our opinion) when estimating point-null BF’s a wide default prior gives far too much evidence in support of the null. In point-null BF testing the Savage-Dickey density ratio is the ratio between the height of the prior at 0 and the height of the posterior at zero (see Figure under section “testing against point null 0”). This means BFs can be very prior sensitive (seen in SI tables 5 & 6). For this reason, we thought it made sense to do prior sensitivity testing, to ensure our conclusions in favor of the null were not caused solely by an overly wide prior (preregistered orange distribution) we decided to report the 3 narrower priors (blue ones).

      Alternative Bayesian null hypotheses testing methods such as using Bayes Factors to test against a null region and ‘region of practical equivalence testing’ are less prior sensitive, yet both methods demand the researcher (e.g. ‘us’) to decide on a minimal effect size of practical interest. Once a minimal effect size of interest is determined any effect within this boundary is taken as evidence in support of the null hypothesis.

      (5) It is unclear why a different method was employed for the August / September data analysis compared to the full-time series. 

      We used a local-randomization RD framework, an entirely different empirical framework than continuity methods (resulting in a different estimate). For an overview see the primer by Cattaneo, Idrobo & Titiunik 2023 (“A Practical Introduction to Regression Discontinuity Designs: Extensions”; https://arxiv.org/abs/2301.08958).

      A local randomization framework is optimal when the running variable is discrete (as in our case with DOB in months) (Cattaneo, Idrobo & Titiunik 2023). It makes stronger assumptions on exchangeability therefore a very narrow window around the cutoff needs to be used. See Figure 2.1 and 2.2 (in the Cattaneo, Idrobo & Titiunik 2023) for graphical illustrations of 1) a randomized experiment, 2) a continuity RD design, and 3) local-randomization RD. Using the full-time series in a local randomization analysis is not recommended as there is no control for differences between individuals as we move further away from the cutoff – making the estimated parameter highly endogenous.

      We understand how it is confusing to have both a new framework and Bayesian methods (we could have chosen a fully frequentist approach) but using a different framework allows us to weigh up the aforementioned ‘bias vs variance tradeoff’ while Bayesian methods allow us to say something about the weight of evidence (for or against) our hypothesis.

      (6) Figure 1 - why not use model fits from those employed for hypothesis testing? 

      This is a great suggestion (ties into #3), we have now redone Figure 1.

      (7) The section on "correlational effect" might also benefit from additional analyses and clarifications. Indeed, the data come from the same randomized experiment for which minimum education requirements were adjusted. Was the only difference that the number of years of education was studied as opposed to the cohort? If so, would the results of this analysis be similar in another subsample of the UK Biobank for which there was no change in policy?

      We have clarified the methods section for the correlational/associational effect. This was the same subset of individuals for the local randomization analysis; all we did was change the independent variable from an exogenous dummy-coded ROSLA term (where half of the sample had the natural experiment) to a continuous (endogenous) educational attainment IV. 

      In principle, the results from the associational analysis should be exactly the same if we use other UK Biobank cohorts. To see if the association of education attainment with the global neuroimaging cohorts was similar across sub-cohorts of new individuals, we conducted post hoc Bayesian analysis on eight more subcohort of 10-month intervals, spaced 2 years apart from each other (Sup. Figure 7; each indicated by a different color). Four of these sub-cohorts predate ROSLA, while the other four are after ROSLA. Educational attainment is slowly increasing across the cohorts of individuals born from 1949 until 1965; intriguingly the effect of ROSLA is visually evident in the distributions of educational attainment (Sup. Figure 7). Also, as seen in the cohorts predating ROSLA more and more individuals were (already) choosing to stay in education past 15 years of age (see cohort 1949 vs 1955 in Sup. Figure 7).

      Sup. Figure 8 illustrates boxplots of the educational attainment posterior of the eight sub-cohorts in addition to our original analysis (s1957) using a normal distributed prior with a mean of 0 and a sd of 1. Total surface area shows a remarkably replicable association with education attainment. Yet, it is evident the “extremely strong” association we found for CSF was a statistical fluke – as the posterior of other cohorts (bar our initial test) crosses zero. The conclusions for the other global neuroimaging covariates where we concluded ‘no associational effect’ seems to hold across cohorts.

      We have now added methods, deviation from preregistration, and the following excerpt to the results:

      “A post hoc replication of this associational analysis in eight additional 10-month cohorts spaced two years apart (Sup. Figure 7) indicates our preregistered report on the associational effect of educational attainment on CSF to be most likely a false-positive (Sup. Figure 8). Yet, the positive association between surface area and educational attainment is robust across the additional eight replication cohorts.”

      Reviewer #2 (Public review): 

      Summary: 

      The authors conduct a causal analysis of years of secondary education on brain structure in late life. They use a regression discontinuity analysis to measure the impact of a UK law change in 1972 that increased the years of mandatory education by 1 year. Using brain imaging data from the UK Biobank, they find essentially no evidence for 1 additional year of education altering brain structure in adulthood. 

      Strengths: 

      The authors pre-registered the study and the regression discontinuity was very carefully described and conducted. They completed a large number of diagnostic and alternate analyses to allow for different possible features in the data. (Unlike a positive finding, a negative finding is only bolstered by additional alternative analyses). 

      Weaknesses: 

      While the work is of high quality for the precise question asked, ultimately the exposure (1 additional year of education) is a very modest manipulation and the outcome is measured long after the intervention. Thus a null finding here is completely consistent educational attainment (EA) in fact having an impact on brain structure, where EA may reflect elements of training after a second education (e.g. university, post-graduate qualifications, etc) and not just stopping education at 16 yrs yes/no. 

      The work also does not address the impact of the UK Biobank's well-known healthy volunteer bias (Fry et al., 2017) which is yet further magnified in the imaging extension study (Littlejohns et al., 2020). Under-representation of people with low EA will dilute the effects of EA and impact the interpretation of these results. 

      References: 

      Fry, A., Littlejohns, T. J., Sudlow, C., Doherty, N., Adamska, L., Sprosen, T., Collins, R., & Allen, N. E. (2017). Comparison of Sociodemographic and Health-Related Characteristics of UK Biobank Participants With Those of the General Population. American Journal of Epidemiology, 186(9), 1026-1034. https://doi.org/10.1093/aje/kwx246 

      Littlejohns, T. J., Holliday, J., Gibson, L. M., Garratt, S., Oesingmann, N., Alfaro-Almagro, F., Bell, J. D., Boultwood, C., Collins, R., Conroy, M. C., Crabtree, N., Doherty, N., Frangi, A. F., Harvey, N. C., Leeson, P., Miller, K. L., Neubauer, S., Petersen, S. E., Sellors, J., ... Allen, N. E. (2020). The UK Biobank imaging enhancement of 100,000 participants: rationale, data collection, management and future directions. Nature Communications, 11(1), 2624. https://doi.org/10.1038/s41467-020-15948-9 

      We thank the reviewer for the positive comments and constructive feedback, in particular, their emphasis on volunteer bias in UKB (similar points were mentioned by Reviewer 3). We have now addressed these limitations with the following passage in the discussion:

      “The UK Biobank is known to have ‘healthy volunteer bias’, as respondents tend to be healthier, more educated, and are more likely to own assets [71,72]. Various types of selection bias can occur in non-representative samples, impacting either internal (type 1) or external (type 2) validity. One benefit of a natural experimental design is that it protects against threats to internal validity from selection bias [43], design-based internal validity threats still exist, such as if volunteer bias differentially impacts individuals based on the cutoff for assignment. A more pressing limitation – in particular, for an education policy change – is our power to detect effects using a sample of higher-educated individuals. This is evident in our first stage analysis examining the percentage of 15-year-olds impacted by ROSLA, which we estimate to be 10% in neuro-UKB (Sup. Figure 2 & Sup. Table 2), yet has been reported to be 25% in the UK general population [41]. Our results should be interpreted for this subpopulation  (UK, 1973, from 15 to 16 years of age, compliers) as we estimate a ‘local’ average treatment effect [73]. Natural experimental designs such as ours offer the potential for high internal validity at the expense of external validity.”

      We also highlighted it both in the results and methods.

      We appreciate that one year of education may seem modest compared to the entire educational trajectory, but as an intervention, we disagree that one year of education is ‘a very modest manipulation’. It is arguably one of the largest positive manipulations in childhood development we can administer. If we were to translate a year of education into the language of a (cognitive) intervention, it is clear that the manipulation, at least in terms of hours, days, and weeks, is substantial. Prior work on structural plasticity (e.g., motor, spatial & cognitive training) has involved substantially more limited manipulations in time, intensity, and extent. There is even (limited) evidence of localized persistent long-term structural changes (Wollett & Maguire, 2011, Cur. Bio.).

      We have now also highlighted the limited generalizability of our findings since we estimate a ‘local’ average treatment effect. It is possible higher education (college, university, vocational schools, etc.) could impact brain structure, yet we see no theoretical reason why it would while secondary wouldn’t. Moreover, higher education education is even trickier to research empirically due to heightened self and administrative selection pressures. While we cannot discount this possibility, the impacts of endogenous factors such as genetics and socioeconomic status are most likely heightened. That being said, higher education offers exciting possibilities to compare more domain-specific processes (e.g., by comparing a philosophy student to a mathematics student). Causality could be tested in European systems with point entry into field-specific programs – allowing comparison of students who just missed entry criteria into one topic and settled for another.

      Regarding the amount of time following the manipulation, as we highlight in our discussion this is both a weakness and a strength. Viewed from a developmental neuroplasticity lens it would have been nice to have imaging immediately following the manipulation. Yet, from an aging perspective, our design has increased power to detect an effect.  

      Reviewer #2 (Recommendations for the authors): 

      (1) The authors assert there is no strong causal evidence for EA on brain structure. This overlooks work from Mendielian Randomisation, e.g. this careful work: https://pubmed.ncbi.nlm.nih.gov/36310536/ ... evidence from (good quality) MR studies should be considered. 

      We thank the reviewer for highlighting this well-done mendelian randomization study. We have now added this citation and removed previous claims on the “lack of causal evidence existing”. We refrain from discussing Mendelian randomization, as it it would need to be accompanied by a nuanced discussion on the strong limitations regarding EduYears-PGS in Mendelian randomization designs.

      (2) Tukey/Boxplot is a good name for your identification of outliers but your treatment of outliers has a well-recognized name that is missing: Windsorisation. Please add this term to your description to help the reader more quickly understand what was done. 

      Thanks, we have now added the term winsorized.

      (3) Nowhere is it plainly stated that "fuzzy" means that you allow for imperfect compliance with the exposure, i.e. some children born before the cut-off stayed in school until 16, and some born after the cut-off left school before 16. For those unfamiliar with RD it would be very helpful to explain this at or near the first reference of the term "fuzzy". 

      We have now clarified the term ‘fuzzy’ to the results and methods:

      methods:

      “RD designs, like ours, can be ‘fuzzy’ indicating when assignment only increases the probability of receiving it, in turn, treatment assigned and treatment received do not correspond for some units 33,53. For instance, due to cultural and historical trends, there was an increase in school attendance before ROSLA; most adolescents were continuing with education past 15 years of age (Sup Plot. 7b). Prior work has estimated that 25 percent of children would have left school a year earlier if not for ROSLA 41. Using the UK Biobank, we estimate this proportion to be around 10%, as the sample is healthier and of higher SES than the general population (Sup. Figure 2; Sup. Table 2) 46–48.”

      (4) Supplementary Figure 2 never states what the percentage actually measures. What exactly does each dot represent? Is it based on UK Biobank subjects with a given birth month? If so clarify. 

      Fixed!

      Reviewer #3 (Public review): 

      Summary: 

      This study investigates evidence for a hypothesized, causal relationship between education, specifically the number of years spent in school, and brain structure as measured by common brain phenotypes such as surface area, cortical thickness, total volume, and diffusivity. 

      To test their hypothesis, the authors rely on a "natural" intervention, that is, the 1972 ROSLA act that mandated an extra year of education for all 15-year-olds. The study's aim is to determine potential discontinuities in the outcomes of interest at the time of the policy change, which would indicate a causal dependence. Naturalistic experiments of this kind are akin to randomised controlled trials, the gold standard for answering questions of causality. 

      Using two complementary, regression-based approaches, the authors find no discernible effect of spending an extra year in primary education on brain structure. The authors further demonstrate that observational studies showing an effect between education and brain structure may be confounded and thus unreliable when assessing causal relationships. 

      Strengths: 

      (1) A clear strength of this study is the large sample size totalling up to 30k participants from the UK Biobank. Although sample sizes for individual analyses are an order of magnitude smaller, most neuroimaging studies usually have to rely on much smaller samples. 

      (2) This study has been preregistered in advance, detailing the authors' scientific question, planned method of inquiry, and intended analyses, with only minor, justifiable changes in the final analysis. 

      (3) The analyses look at both global and local brain measures used as outcomes, thereby assessing a diverse range of brain phenotypes that could be implicated in a causal relationship with a person's level of education. 

      (4) The authors use multiple methodological approaches, including validation and sensitivity analyses, to investigate the robustness of their findings and, in the case of correlational analysis, highlight differences with related work by others. 

      (5) The extensive discussion of findings and how they relate to the existing, somewhat contradictory literature gives a comprehensive overview of the current state of research in this area. 

      Weaknesses: 

      (1) This study investigates a well-posed but necessarily narrow question in a specific setting: 15-year-old British students born around 1957 who also participated in the UKB imaging study roughly 60 years later. Thus conclusions about the existence or absence of any general effect of the number of years of education on the brain's structure are limited to this specific scenario. 

      (2) The authors address potential concerns about the validity of modelling assumptions and the sensitivity of the regression discontinuity design approach. However, the possibility of selection and cohort bias remains and is not discussed clearly in the paper. Other studies (e.g. Davies et al 2018, https://www.nature.com/articles/s41562-017-0279-y) have used the same policy intervention to study other health-related outcomes and have established ROSLA as a valid naturalistic experiment. Still, quoting Davies et al. (2018), "This assumes that the participants who reported leaving school at 15 years of age are a representative sample of the sub-population who left at 15 years of age. If this assumption does not hold, for example, if the sampled participants who left school at 15 years of age were healthier than those in the population, then the estimates could underestimate the differences between the groups.". Recent studies (Tyrrell 2021, Pirastu 2021) have shown that UK Biobank participants are on average healthier than the general population. Moreover, the imaging sub-group has an even stronger "healthy" bias (Lyall 2022). 

      (3) The modelling approach used in this study requires that all covariates of no interest are equal before and after the cut-off, something that is impossible to test. Mentioned only briefly, the inclusion and exclusion of covariates in the model are not discussed in detail. Standard imaging confounds such as head motion and scanning site have been included but other factors (e.g. physical exercise, smoking, socioeconomic status, genetics, alcohol consumption, etc.) may also play a role. 

      We thank the reviewer for their numerous positive comments and have now attempted to address the first two limitations (generalizability and UKB bias) with the following passage in the discussion:

      “The UK Biobank is known to have ‘healthy volunteer bias’, as respondents tend to be healthier, more educated, and are more likely to own assets [71,72]. Various types of selection bias can occur in non-representative samples, impacting either internal (type 1) or external (type 2) validity. One benefit of a natural experimental design is that it protects against threats to internal validity from selection bias [43], design-based internal validity threats still exist, such as if volunteer bias differentially impacts individuals based on the cutoff for assignment. A more pressing limitation – in particular, for an education policy change – is our power to detect effects using a sample of higher-educated individuals. This is evident in our first stage analysis examining the percentage of 15-year-olds impacted by ROSLA, which we estimate to be 10% in neuro-UKB (Sup. Figure 2 & Sup. Table 2), yet has been reported to be 25% in the UK general population [41]. Our results should be interpreted for this subpopulation  (UK, 1973, from 15 to 16 years of age, compliers) as we estimate a ‘local’ average treatment effect [73]. Natural experimental designs such as ours offer the potential for high internal validity at the expense of external validity.”

      We further highlight this in the results section:

      “Compliance with ROSLA was very high (near 100%; Sup. Figure 2). However, given the cultural and historical trends leading to an increase in school attendance before ROSLA, most adolescents were continuing with education past 15 years of age before the policy change (Sup Plot. 7b). Prior work has estimated 25 percent of children would have left school a year earlier if not for ROSLA 41. Using the UK Biobank, we estimate this proportion to be around 10%, as the sample is healthier and of higher SES than the general population (Sup. Figure 2; Sup. Table 2) 46–48.”

      Healthy volunteer bias can create two types of selection bias; crucially participation itself can serve as a collider threatening internal validity (outlined in van Alten et al., 2024; https://academic.oup.com/ije/article/53/3/dyae054/7666749). Natural experimental designs are partially sheltered from this major limitation, as ‘volunteer bias’ would have to differentially impact individuals on one side of the cutoff and not the other – thereby breaking a primary design assumption of regression discontinuity. Substantial prior work (including this article) has not found any threats to the validity of the 1973 ROSLA (Clark & Royer 2010, 2013; Barcellos et al., 2018, 2023; Davies et al., 2018, 2023). While the Davies 2028 article did IP-weight with the UK Biobank sample, Barcellos and colleagues 2023 (and 2018) do not, highlighting the following “Although the sample is not nationally representative,  our estimates have internal validity because there is no differential selection on the two sides of the September 1, 1957 cutoff – see  Appendix A.”.

      The second (more acknowledged & arguably less problematic) type of selection bias results in threats to external validity (aka generalizability). As highlighted in your first point; this is a large limitation with every natural experimental design, yet in our case, this is further amplified by the UK Biobank’s healthy volunteer bias. We have now attempted to highlight this limitation in the discussion passage above.

      Point 3 – the inability to fully confirm design validity – is again, another inherent limitation of a natural experimental approach. That being said, extensive prior work has tested different predetermined covariates in the 1973 ROSLA (cited within), and to our knowledge, no issues have been found. The 1973 ROSLA seems to be one of the better natural experiments around (there was also a concerted effort to have an ‘effective’ additional year; see Clark & Royer 2010). For these reasons, we stuck with only testing the variables we wanted to use to increase precision (also offering new neuroimaging covariates that didn’t exist in the literature base). One additional benefit of ROSLA was that the cutoff was decided years later on a variable that happened (date of birth) in the past – making it particularly hard for adolescents to alter their assignments.

      Reviewer #3 (Recommendations for the authors): 

      (1) FMRIB's preprocessing pipeline is mentioned. Does this include deconfounding of brain measures? Particularly, were measures deconfounded for age before the main analysis? 

      This is such a crucial point that we triple-checked, brain imaging phenotypes were not corrected for age (https://biobank.ctsu.ox.ac.uk/crystal/crystal/docs/brain_mri.pdf) – large effects of age can be seen in the global metrics; older individuals have less surface area, thinner cortices, less brain volume (corrected for head size), more CSF volume (corrected for head size), more white matter hyperintensities, and worse FA values. Figure 1 shows these large age effects, which are controlled for in our continuity-based RD analysis.

      One’s date of birth (DOB) of course does not match perfectly to their age, this is why we included the covariate ‘visit date’; this interplay can now be seen in our updated SI Figure 1 (recommended in #3) which shows the distributions of visit date, DOB, and age of scan. 

      In a valid RD design covariates should not be necessary (as they should be balanced on either side of the cutoff), yet the inclusion of covariates does increase precision to detect effects. We tested this assumption, finding the effect of ‘visit date’ and its quadratic term to be not related to ROSLA (Sup. Table 1). This adds further evidence (specific to the UK Biobank sample) to the existing body of work showing the 1973 ROSLA policy change to not violate any design assumptions. Threats to internal validity would more than likely increase endogeneity and result in ‘false causal positive causal effects’ (which is not what we find).  

      (2) Despite the large overall sample size, I am wondering whether the effective number of samples is sufficient to detect a potentially subtle effect that is further attenuated by the long time interval before scanning. As stated, for the optimised bandwidth window (DoB 20 to 35 months around cut-off), N is about 5000. Does this mean that effectively about 250 (10%) out of about 2500 participants born after the cut-off were leaving school at 16 rather than 15 because of ROSLA? For the local randomisation analysis, this becomes about N=10 (10% out of 100). Could a power analysis show that these cohort sizes are large enough to detect a reasonably large effect? 

      This is a very valid point, one which we were grappling with while the paper was out for review. We now draw attention to this in the results and highlight this as a limitation in the discussion. While UKB’s non-representativeness limits our power (10% affected rather than 25% in the general population), it is still a very large sample. Our sample size is more in line with standard neuroimaging studies than with large cohort studies. 

      The novelty of our study is its causal design, while we could very precisely measure an effect of some phenotype (variable X) in 40,000 individuals. This effect is probably not what we think we are measuring. Without IP-weighting it could even have a different sign. But more importantly, it is not variable X – it is the thousands of things (unmeasured confounders) that lead an individual to have more or less of variable X. The larger the sample the easier it is for small unmeasured confounders to reach significance (Big data paradox) – this in no way invalidates large samples, it is just our thinking and how we handle large samples will hopefully change to a more casual lens.

      (3) Supplementary Figure 1: A similar raincloud plot of date of birth would be instructive to visualise the distribution of subjects born before and after the 1957 cut-off. 

      Great idea! We have done this in Sup Fig. 1 for both visit date and DOB.

      (4) p.9: Not sure about "extreme evidence", very strong would probably be sufficient. 

      As preregistered, we interpreted Bayes Factors using Jeffrey’s criteria. ‘Extreme evidence’ is only used once and it is about finding an associational effect of educational attainment on CSF (BF10 > 100). Upon Reviewer 1’s recommendation 7, we conducted eight replication samples (Sup. Figure 7 & 8) and have now added the following passage to the results:

      “A post hoc replication of this associational analysis in eight additional 10-month cohorts spaced two years apart (Sup. Figure 7) indicates our preregistered report on the associational effect of educational attainment on CSF to be most likely a false-positive (Sup. Figure 8). Yet, the positive association between surface area and educational attainment is robust across the additional eight replication cohorts.”

      (5) The code would benefit from a bit of clean-up and additional documentation. In its current state, it is not easy to use, e.g. in a replication study. 

      We have now further added documentation to our code; including a readme describing what each script does. The analysis pipeline used is not ideal for replications as the package used for continuity-based RD (RDHonest) initially could not handle covariates – therefore we manually corrected our variables after a discussion with Prof Kolesár (https://github.com/kolesarm/RDHonest/issues/7). 

      Prof Kolesár added this functionality recently and future work should use the latest version of the package as it can correct for covariates. We have a new preprint examining the effect of 1972 ROLSA on telomere length in the UK Biobank using the latest package version of RDHonest (https://www.biorxiv.org/content/10.1101/2025.01.17.633604v1). To ensure maximum availability of such innovations, we will ensure the most up-to-date version of this script becomes available on this GitHub link (https://github.com/njudd/EduTelomere).

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      In a heroic effort, Ozanna Burnicka-Turek et al. have made and investigated conduction system-specific Tbx3-Tbx5 deficient mice and investigated their cardiac phenotype. Perhaps according to expectations, given the body of literature on the function of the two T-box transcription factors in the heart/conduction system, the cardiomyocytes of the ventricular conduction system seemed to convert to "ordinary" ventricular working myocytes. As a consequence, loss of VCS-specific conduction system propagation was observed in the compound KO mice, associated with PR and QRS prolongation and elevated susceptibility to ventricular tachycardia.

      Strengths:

      Great genetic model. Phenotypic consequences at the organ and organismal levels are well investigated. The requirement of both Tbx3 and Tbx5 for maintaining VCS cell state has been demonstrated.

      We thank Reviewer #1 for acknowledging the effort involved in generating and characterizing the Tbx3/Tbx5 double conditional knockout mouse model and for highlighting the significance of this work in elucidating the role of these transcription factors in maintaining the functional and transcriptional identity of the ventricular conduction system. 

      Weaknesses:

      The actual cell state of the Tbx3/Tbx5 deficient conducting cells was not investigated in detail, and therefore, these cells could well only partially convert to working cardiomyocytes, and may, in reality, acquire a unique state.

      We agree with Reviewer #1 that the Tbx3/Tbx5 double mutant ventricular conduction myocardial cells may only partially convert to working cardiomyocytes or may acquire a unique state.  The transcriptional state of the double mutant VCS cells was investigated by bulk profiling of key genes associated with specific conduction and non-conduction cardiac regions, including fast conduction, slow conduction, or working myocardium. Neither the bulk transcriptional approaches nor the optical mapping approaches we employed capture single-cell data; in both cases, the data represents aggregated signals from multiple cells (1, 2). Single cell approaches for transcriptional profiling and cellular electrophysiology would clarify this concern and are appropriate for future studies. 

      (1) O’Shea C, Nashitha Kabri S, Holmes AP, Lei M, Fabritz L, Rajpoot K, Pavlovic D (2020) Cardiac optical mapping – State-of-the-art and future challenges. The International Journal of Biochemistry & Cell Biology 126:105804. doi: 10.1016/j.biocel.2020.105804. (2) Efimov IR, Nikolski VP, and Salama G (2004) Optical Imaging of the Heart. Circulation Research 95:21-33. doi: 10.1161/01.RES.0000130529.18016.35.

      Reviewer #2 (Public review):

      Summary:

      The goal of this work is to define the functions of T-box transcription factors Tbx3 and Tbx5 in the adult mouse ventricular cardiac conduction system (VCS) using a novel conditional mouse allele in which both genes are targeted in cis. A series of studies over the past 2 decades by this group and others have shown that Tbx3 is a transcriptional repressor that patterns the conduction system by repressing genes associated with working myocardium, while Tbx5 is a potent transcriptional activator of "fast" conduction system genes in the VCS. In a previous work, the authors of the present study further demonstrated that Tbx3 and Tbx5 exhibit an epistatic relationship whereby the relief of Tbx3-mediated repression through VCS conditional haploinsufficiency allows better toleration of Tbx5 VCS haploinsufficiency. Conversely, excess Tbx3-mediated repression through overexpression results in disruption of the fast-conduction gene network despite normal levels of Tbx5. Based on these data the authors proposed a model in which repressive functions of Tbx3 drive the adoption of conduction system fate, followed by segregation into a fast-conducting VCS and slow-conduction AVN through modulation of the Tbx5/Tbx3 ratio in these respective tissue compartments.

      The question motivating the present work is: If Tbx5/Tbx3 ratio is important for slow versus fast VCS identity, what happens when both genes are completely deleted from the VCS? Is conduction system identity completely lost without both factors and if so, does the VCS network transform into a working myocardium-like state? To address this question, the authors have generated a novel mouse line in which both Tbx5 and Tbx3 are floxed on the same allele, allowing complete conditional deletion of both factors using the VCS-specific MinK-CreERT2 line, convincingly validated in previous work. The goal is to use these double conditional knockout mice to further explore the model of Tbx3/Tbx5 co-dependent gene networks and VCS patterning. First, the authors demonstrate that the double conditional knockout allele results in the expected loss of Tbx3 and Tbx5 specifically in the VCS when crossed with Mink-CreERT2 and induced with tamoxifen. The double conditional knockout also results in premature mortality. Detailed electrophysiological phenotyping demonstrated prolonged PR and QRS intervals, inducible ventricular tachycardia, and evidence of abnormal impulse propagation along the septal aspect of the right ventricle. In addition, the mutants exhibit downregulation of VCS genes responsible for both fast conduction AND slow conduction phenotypes with upregulation of 2 working myocardial genes including connexin-43. The authors conclude that loss of both Tbx3 and Tbx5 results in "reversion" or "transformation" of the VCS network to a working myocardial phenotype, which they further claim is a prediction of their model and establishes that Tbx3 and Tbx5 "coordinate" transcriptional control of VCS identity.

      We appreciate Reviewer #2’s detailed summary of the study’s aims, methodologies, and findings, as well as their thoughtful suggestions for further analysis. We are grateful for their recognition of our genetic model’s novelty and robustness.

      Overall Appraisal:

      As noted above, the present study does not further explore the Tbx5/Tbx3 ratio concept since both genes are completely knocked out in the VCS. Instead, the main claims are that the absence of both factors results in a transcriptional shift of conduction tissue towards a working myocardial phenotype, and that this shift indicates that Tbx5 and Tbx3 "coordinate" to control VCS identity and function.

      We agree with this reviewer’s assessment of the assertions in our manuscript.  The novel combined Tbx5/Tbx3 double mutant model does not further explore the TBX5/TBX3 ratio concept, which we previously examined in detail (1). Instead, as the Reviewer notes, this manuscript focuses on testing a model that the coordinated activity of Tbx3 and Tbx5 defines specialized ventricular conduction identity. 

      (1) Burnicka-Turek O, Broman MT, Steimle JD, Boukens BJ, Petrenko NB, Ikegami K, Nadadur RD, Qiao Y, Arnolds DE, Yang XH, Patel VV, Nobrega MA, Efimov IR, Moskowitz IP (2020) Transcriptional Patterning of the Ventricular Cardiac Conduction System. Circulation Research 127:e94-e106. doi:10.1161/CIRCRESAHA.118.314460. 

      Strengths:

      (1) Successful generation of a novel Tbx3-Tbx5 double conditional mouse model.

      (2) Successful VCS-specific deletion of Tbx3 and Tbx5 using a VCS-specific inducible Cre driver line.

      (3) Well-powered and convincing assessments of mortality and physiological phenotypes. (4) Isolation of genetically modified VCS cells using flow.

      We thank Reviewer #2 for acknowledging the listed strengths of our study.

      Weaknesses:

      (1) In general, the data is consistent with a long-standing and well-supported model in which Tbx3 represses working myocardial genes and Tbx5 activates the expression of VCS genes, which seem like distinct roles in VCS patterning. However, the authors move between different descriptions of the functional relationship and epistatic relationship between these factors, including terms like "cooperative", "coordinated", and "distinct" at various points. In a similar vein, sometimes terms like "reversion" are used to describe how VCS cells change after Tbx3/Tbx5 conditional knockout, and other times "transcriptional shift" and at other times "reprogramming". But these are all different concepts. The lack of a clear and consistent terminology for describing the phenomena observed makes the overarching claims of the manuscript more difficult to evaluate.

      We discriminate prior work on the “long-standing and well-supported model’ supported by investigation of the role of Tbx5 and Tbx3 independently from this work examining the coordinated role of Tbx5 and Tbx3. Prior work demonstrated that Tbx3 represses working myocardial genes and Tbx5 activates expression of VCS genes, consistent with the reviewer’s suggestion of their distinct roles in VCS patterning. However, the current study uniquely evaluates the combined role of Tbx3 and Tbx5 in distinguishing specialized conduction identify from working myocardium, for the first time. 

      We appreciate Reviewer #2’s feedback regarding the need for consistent terminology when describing the impact of the double Tbx3 and Tbx5 mutant. We will edit the manuscript to replace terms like “reversion” with “transcriptional shift” or “transformation” when describing the observed phenotype, and we will use “coordination” to describe the combined role of Tbx5 and Tbx3 in maintaining VCS-specific identity.

      (2) A more direct quantitative comparison of Tbx5 Adult VCS KO with Tbx5/Tbx3 Adult VCS double KO would be helpful to ascertain whether deletion of Tbx3 on top of Tbx5 deletion changes the underlying phenotype in some discernable way beyond mRNA expression of a few genes. Superficially, the phenotypes look quite similar at the EKG and arrhythmia inducibility level and no optical mapping data from a single Tbx5 KO is presented for comparison to the double KO.

      We thank Reviewer #2 for the suggestions that a direct comparison between Tbx5 single conditional knockout and Tbx3/Tbx5 double conditional knockout models may help isolate the specific contribution of Tbx3 deletion in addition to Tbx5 deletion. 

      Previous studies have assessed the effect of single Tbx5 CKO in the VCS of murine hearts (1, 3, 5). Arnolds et al. demonstrated that the removal of Tbx5 from the adult ventricular conduction system results in VCS slowing, including prolonged PR and QRS intervals, prolongation of the His duration and His-ventricular (HV) interval (3).

      Furthermore, Burnicka-Turek et al. demonstrated that the single conditional knockout of Tbx5 in the adult VCS caused a shift toward a pacemaker cell state, with ectopic beats and inappropriate automaticity (1). Whole-cell patch clamping of VCS-specific Tbx5 deficient cells revealed action potentials characterized by a slower upstroke (phase 0), prolonged plateau (phase 2), delayed repolarization (phase 3), and enhanced phase 4 depolarization - features characteristic of nodal action potentials rather than typical VCS action potentials (3). These observations were interpreted as uncovering nodal potential of the VCS in the absence of Tbx5. Based on the role of Tbx3 in CCS specification (2), we hypothesized that the nodal state of the VCS uncovered in the absence of Tbx5 was enabled by maintained Tbx3 expression. This motivated us to generate the double Tbx5

      / Tbx3 knockout model to examine the state of the VCS in the absence of both T-box TFs. In the current study, we demonstrate that the VCS-specific deletion of Tbx3 and Tbx5 results in the loss of fast electrical impulse propagation in the VCS, similar to that observed in the single Tbx5 mutant. However, unlike the Tbx5 single mutant, the Tbx3/Tbx5 double deletion does not cause a gain of pacemaker cell state in the VCS. Instead, the physiological data suggests a transition toward non-conduction working myocardial physiology. This conclusion is supported by the presence of only a single upstroke in the optical action potential (OAP) recorded from the His bundle region and VCS cells in Tbx3/Tbx5 double conditional knockout mice. The electrical properties of VCS cells in the double knockout are functionally indistinguishable from those of ventricular working myocardial cells. As a result, ventricular impulse propagation is significantly slowed, resembling activation through exogenous pacing rather than the rapid conduction typically associated with the VCS. We will edit the text of the manuscript to more carefully distinguish the observations between these models, as suggested.

      (1) Burnicka-Turek O, Broman MT, Steimle JD, Boukens BJ, Petrenko NB, Ikegami K, Nadadur RD, Qiao Y, Arnolds DE, Yang XH, Patel VV, Nobrega MA, Efimov IR, Moskowitz IP (2020) Transcriptional Patterning of the Ventricular Cardiac Conduction System. Circulation Research 127:e94-e106. doi:10.1161/CIRCRESAHA.118.314460. 

      (2) Mohan RA, Bosada FM, van Weerd JH, van Duijvenboden K, Wang J, Mommersteeg MTM, Hooijkaas IB, Wakker V, de Gier-de Vries C, Coronel R, Boink GJJ, Bakkers J, Barnett P, Boukens BJ, Christoffels VM (2020) T-box transcription factor 3 governs a transcriptional program for the function of the mouse atrioventricular conduction system. Proc Natl Acad Sci U S A. 117:18617-18626. doi: 10.1073/pnas.1919379117.

      (3) Arnolds DE, Liu F, Fahrenbach JP, Kim GH, Schillinger KJ, Smemo S, McNally EM, Nobrega MA, Patel VV, Moskowitz IP (2012) TBX5 drives Scn5a expression to regulate cardiac conduction system function. The Journal of Clinical Investigation 122:2509–2518. doi: 10.1172/JCI62617.

      (4) Frank DU, Carter KL, Thomas KR, Burr RM, Bakker ML, Coetzee WA, Tristani-Firouzi M, Bamshad MJ, Christoffels VM, Moon AM (2012) Lethal arrhythmias in Tbx3-deficient mice reveal extreme dosage sensitivity of cardiac conduction system function and homeostasis. Proc Natl Acad Sci U S A. 109:E154-63. doi: 10.1073/pnas.1115165109.

      (5) Moskowitz IP, Pizard A, Patel VV, Bruneau BG, Kim JB, Kupershmidt S, Roden D, Berul CI, Seidman CE, Seidman JG (2004) The T-Box transcription factor Tbx5 is required for the patterning and maturation of the murine cardiac conduction system. Development 131:4107-4116. doi: 10.1242/dev.01265. PMID: 15289437.

      (3) The authors claim that double knockout VCS cells transform to working myocardial fate, but there is no comparison of gene expression levels between actual working myocardial cells and the Tbx3/Tbx5 DKO VCS cells so it's hard to know if the data reflect an actual cell state change or a more non-specific phenomenon with global dysregulation of gene expression or perhaps dedifferentiation. I understand that the upregulation of Gja1 and Smpx is intended to address this, but it's only two genes and it seems relevant to understand their degree of expression relative to actual working myocardium. In addition, the gene panel is somewhat limited and does not include other key transcriptional regulators in the VCS such as Irx3 and Nkx2-5. RNA-seq in these populations would provide a clearer comparison among the groups.

      And

      the main claims are that the absence of both factors results in a transcriptional shift of conduction tissue towards a working myocardial phenotype, and that this shift indicates that Tbx5 and Tbx3 "coordinate" to control VCS identity and function. However, only limited data are presented to support the claim of transcriptional reprogramming since the knockout cells are not directly compared to working myocardial cells at the transcriptional level and only a small number of key genes are assessed (versus genome-wide assessment).

      We appreciate Reviewer #2’s suggestion to expand the gene expression analysis in Tbx3/Tbx5-deficient VCS cells by including other specific genes and comparisons with “native”/actual working ventricular myocardial cells and broadening the gene panel. In this study, we evaluated core cardiac conduction system markers, revealing a loss of conduction system-specific gene expression in the double mutant VCS. Furthermore, we evaluated key working myocardial markers normally excluded from the conduction system, Gja1 and Smpx, revealing a shift towards a working myocardial state in the double mutant VCS (Figure 4). We agree that a more comprehensive analysis, such as transcriptome-wide approaches, would offer greater clarity on the extent and specificity of the observed shift from conduction to non-conduction identity. These approaches are appropriate directions for future studies.

      (4) From the optical mapping data, it is difficult to distinguish between the presence of (a) a focal proximal right bundle branch block due to dysregulation of gene expression in the VCS but overall preservation of the right bundle and its distal ramifications; from (b) actual loss of the VCS with reversion of VCS cells to a working myocardial fate. Related to this, the authors claim that this experiment allows for direct visualization of His bundle activation, but can the authors confirm or provide evidence that the tissue penetration of their imaging modality allows for imaging of a deep structure like the AV bundle as opposed to the right bundle branch which is more superficial? Does the timing of the separation of the sharp deflection from the subsequent local activation suggest visualization of more distal components of the VCS rather than the AV bundle itself? Additional clarification would be helpful.

      And

      In addition, the optical mapping dataset is incomplete and has alternative interpretations that are not excluded or thoroughly discussed.

      We agree with Reviewer #2 that the resolution of the optical mapping experiment may be insufficient to precisely localize the conduction block due to the limited signal strength from the VCS. It is possible that the region defined as the His Bundle also includes portions of the right bundle branch. Our control mice show VCS OAP upstrokes consistent with those reported by Tamaddon et al. (2000) using Di-4-ANEPPS (1). We appreciate the Reviewer’s attention to alternative interpretations, and we will incorporate these caveats into the manuscript text. 

      (1) Tamaddon HS, Vaidya D, Simon AM, Paul DL, Jalife J, Morley GE (2000) Highresolution optical mapping of the right bundle branch in connexin40 knockout mice reveals slow conduction in the specialized conduction system. Circulation Research 87:929-36. doi: 10.1161/01.res.87.10.929. 

      Impact:

      The present study contributes a novel and elegantly constructed mouse model to the field. The data presented generally corroborate existing models of transcriptional regulation in the VCS but do not, as presented, constitute a decisive advance.

      And

      In sum, while this study adds an elegantly constructed genetic model to the field, the data presented fit well within the existing paradigm of established functions of Tbx3 and Tbx5 in the VCS and in that sense do not decisively advance the field. Moreover, the authors' claims about the implications of the data are not always strongly supported by the data presented and do not fully explore alternative possibilities.

      We appreciate Reviewer # 2’s acknowledgment of the elegance and novelty of the mouse model we generated. However, we respectfully disagree with their assessment that this work merely corroborates existing models without providing a decisive advance. Previous studies have investigated single Tbx5 or Tbx3 gene knockouts in-depth and established the T-box ratio model for distinguishing fast VCS from slow nodal conduction identity (1) that the reviewer alludes to in earlier comments. In contrast, this study aimed to explore a different model, that the combined effects of Tbx5 and Tbx3 distinguish adult VCS identity from non-conduction working myocardium. The coordinated Tbx3 and Tbx5 role in conduction system identify remained untested due to the lack of a mouse model that allowed their simultaneous removal. The very model the reviewer recognizes as “novel and elegantly constructed” has allowed the examination of the coordinated role of Tbx5 and Tbx3 for the first time. While we acknowledge the opportunity for additional depth of investigation of this model in future studies, the data we present provides consistent experimental support for the coordinated requirement of both Tbx5 and Tbx3 for ventricular cardiac conduction system identity. 

      (1) Burnicka-Turek O, Broman MT, Steimle JD, Boukens BJ, Petrenko NB, Ikegami K, Nadadur RD, Qiao Y, Arnolds DE, Yang XH, Patel VV, Nobrega MA, Efimov IR, Moskowitz IP (2020) Transcriptional Patterning of the Ventricular Cardiac Conduction System. Circulation Research 127:e94-e106. doi:10.1161/CIRCRESAHA.118.314460. 

      Reviewer #3 (Public review):

      Summary:

      In the study presented by Burnicka-Turek et al., the authors generated for the first time a mouse model to cause the combined conditional deletion of Tbx3 and Tbx5 genes. This has been impossible to achieve to date due to the proximity of these genes in chromosome 5, preventing the generation of loss of function strategies to delete simultaneously both genes. It is known that both Tbx3 and Tbx5 are required for the development of the cardiac conduction system by transcription factor-specific but also overlapping roles as seen in the common and diverse cardiac defects found in patients with mutations for these genes. After validating the deletion efficiency and specificity of the line, the authors characterized the cardiac phenotype associated with the cardiac conduction system (CCS)-specific combined deletion of T_bx5_ and Tbx3 in the adult by inducing the activation of the CCS-specific tamoxifen-inducible Cre recombination (MinKcreERT) at 6 weeks after birth. Their analysis of 8-9-week-old animals did not identify any major morphological cardiac defects. However, the authors found conduction defects including prolonged PR and QTR intervals and ventricular tachycardia causing the death of the double mutants, which do not survive more than 3 months after tamoxifen induction. Molecular and optical mapping analysis of the ventricular conduction system (VCS) of these mutants concluded that, in the absence of Tbx5 and Tbx3 function, the cells forming the ventricular conduction system (VCS) become working myocardium and lose the specific contractile features characterizing VCS cells. Altogether, the study identified the critical combined role of Tbx3 and Tbx5 in the maintenance of the VCS in adulthood.

      Strengths:

      The study generated a new animal model to study the combined deletion of Tbx5 and Tbx3 in the cardiac conduction system. This unique model has provided the authors with the perfect tool to answer their biological questions. The study includes top-class methodologies to assess the functional defects present in the different mutants analyzed, and gathered very robust functional data on the conduction defects present in these mutants. They also applied optical action potential (OAP) methods to demonstrate the loss of conduction action potential and the acquisition of working myocardium action potentials in the affected cells because of Tbx5/Tbx3 loss of function. The study used simpler molecular and morphological analysis to demonstrate that there are no major morphological defects in these mutants and that indeed, the conduction defects found are due to the acquisition of working myocardium features by the VCS cells. Altogether, this study identified the critical role of these transcription factors in the maintenance of the VCS in the adult heart.

      We appreciate the Reviewer’s comments regarding the originality and utility of our model and the strengths of our methodological approach. The Reviewer’s appreciation of the molecular and morphological analyses as well as their constructive feedback is highly valuable.

      Weaknesses:

      In the opinion of this reviewer, the weakness in the study lies in the morphological and molecular characterization. The morphological analysis simply described the absence of general cardiac defects in the adult heart, however, whether the CCS tissues are present or not was not investigated. Lineage tracing analysis using the reporter lines included in the crosses described in the study will determine if there are changes in CCS tissue composition in the different mutants studied. Similarly, combining this reporter analysis with the molecular markers found to be dysregulated by qPCR and western blot, will demonstrate that indeed the cells that were specified as VCS in the adult heart, become working myocardium in the absence of Tbx3 and Tbx5 function.

      We appreciate the reviewer’s concern regarding the morphology of the cardiac conduction system in the Tbx3/Tbx5 double conditional knockout model. We did not observe any structural abnormalities, as the Reviewer notes. We agree with their suggestion for using Genetic Inducible Fate Mapping to mark cardiac conduction cells expressing MinKCre. In fact, we utilized this approach to isolate VCS cells for transcriptional profiling. Specifically, we combined the tamoxifen-inducible MinKCreERT allele with the Cre-dependent R26Eyfp reporter allele to label MinKCre-expressing cells in both control VCS and VCS-specific double Tbx3/Tbx5 knockouts. EYFP-positive cells were isolated for transcriptional studies, ensuring that our analysis exclusively targeted conduction system-lineage marked cells. The ability to isolate MinKCre-marked cells from both controls and Tbx5/Tbx3 double mutants indicates that VCS cells persisted in the double knockout. Nonetheless, the suggestion for in-vivo marking by Genetic Inducible

      Fate Mapping and morphologic analysis is a valuable recommendation for future studies. 

      Reviewer #1 (Recommendations for the authors):

      In a heroic effort, Ozanna Burnicka-Turek et al. have made and investigated conduction system-specific Tbx3-Tbx5 deficient mice and investigated their cardiac phenotype. Perhaps according to expectations, given the body of literature on the function of the two T-box transcription factors in the heart/conduction system, the cardiomyocytes of the ventricular conduction system seemed to convert to "ordinary" ventricular working myocytes. As a consequence, loss of VCS-specific conduction system propagation was observed in the compound KO mice, associated with PR and QRS prolongation and elevated susceptibility to ventricular tachycardia.

      Previous work suggested the prediction that VCS-specific genetic ablation of both the TBX3 and TBX5 would transform fast-conducting adult VCS into cells resembling working myocardium, eliminating specialized CCS fate. The current study suggests that this prediction is at least to some extent accurate.

      We appreciate Reviewer #1’s summary and recognition of our study. As the review notes, the simultaneous deletion of Tbx3 and Tbx5 in the mature ventricular conduction system (VCS) suggests a conversion of VCS to "ordinary" ventricular working myocytes. To our knowledge, this represents a novel observation and experimental model that uniquely captures the combined roles of these essential T-box transcription factors. We believe that this model offers a valuable platform for further investigation into the transcriptional mechanisms underlying conduction system specialization.

      (1) The huge effort made to generate the DKO model contrasts with the limited efforts made to study the mechanism. Conditional deficiency of Tbx3 and Tbx5 creates an artificial situation that is useful for addressing fundamental mechanistic questions. The authors provide a rather superficial analysis of the changes in the VCS upon deletion of these two critically important factors and do not provide really novel insights into their requirement/function in the VCS gene regulatory network and epigenetic state. So to what extent do VCS cardiomyocytes (CMs) from Tbx3/5 DKO mice resemble "simple" working myocardium? To what extent do these cells acquire the working myocardial (epigenetic) state, do these cells have an epigenetic memory of the Tbx3/Tbx5+ history, is the enhancer usage between the modified VCS CMs and the working CMs similar or not, etc.? The assumption that the authors' data indicate that the DKO VCS CMs simply acquire a ventricular working "fate" is unlikely. Following this reasoning, the reverse experiment to induce Tbx3 and Tbx5 expression in working CMs would result in complete conversion to VCS CMs, which is also unlikely.

      To answer such questions, transcriptomic and epigenetic state analysis, electrophysiologic analysis (e.g. patch-clamp), cell/subcellular level analysis, etc. would be required, as well as a comparison of the changed state of the DKO VCS CMs to that of working CMs.

      This initial study focused on generating the Tbx3:Tbx5 double-conditional knockout model and characterizing the resulting physiological and molecular changes within the VCS. We analyzed transcriptomic markers of fast conduction (VCS), slow conduction (nodal), and non-conduction (working myocardium). Additionally, we applied optical mapping to evaluate the physiological consequences of the double knockout, which allowed a calculated AP of the VCS to be generated. We agree that a more in-depth mechanistic investigation of the VCS transformation upon Tbx3/Tbx5 deletion by transcriptomic or cellular electrophysiology could provide a deeper understanding of the precise transcriptional/epigenetic state of the VCS in the double knockout and clarify whether there is a partial or complete conversion of VCS cells to a simple working myocardial phenotype. The suggestions by the reviewer will be considered for future studies.

      (2) Tbx3 stimulates BMP-TGFb signaling (e.g. positive loop between Tbx3-Bmp2), which in turn stimulates EMT and modulates the behavior of endocardial and mesenchymal cells. Did the authors investigate the impact of Tbx3/5 DKO on non-CM cells in and around the VCS? (see also comment 1). The insulation of the AVB for example could be a Tbx3/5 non cell autonomous target.

      We appreciate the Reviewer’s suggestion to examine the impact of Tbx3/Tbx5 deletion on non-CM cells surrounding the VCS. While this is an intriguing avenue for future exploration, it falls outside the scope of the current study, which focused on the cardiomyocyte-specific roles of Tbx3 and Tbx5 in maintaining adult VCS identity.

      (3) The MinK-Cre line used (from the Moskowitz lab) also recombines in the AVN (Arnolds et al 2011). The authors do not mention changes in the AVN, and systematically call the line VCS specific (which refers to the AVB, BB, PVCS I assume). This could also impact the PR interval. Please address.

      The MinK-Cre line recombines in the atrioventricular bundle (AVB) and bundle branches (BB). It recombines in cardiomyocytes adjacent to the atrioventricular node (AVN). We previously interpreted these cells as the penetrating portion of the His bundle into the AVN. This line does not recombine in the vast majority, if any, physiologic nodal cells. We also assessed nodal conduction parameters by invasive electrophysiologic (EP) studies. Our data showed that non-VCS parameters, including sinus node recovery time, AV node recovery time, and atrial and ventricular effective refractory periods, remained within normal ranges in Tbx3:Tbx5-deficient mice (please see Figure 2I). These findings indicate that AVN function is preserved in the VCS-specific double knockout, reinforcing the specificity of the observed conduction defects to the ventricular conduction system.

      (4) Did the authors also investigate the electrophysiological changes in the (EGFP+) DKO VCS CMs? Would these resemble the properties of ventricular working CMs, or would they still show some VCS properties? (see also comment 1).

      We performed electrophysiologic analysis of the double knockout by optical mapping. Optical mapping provides tissue-level resolution, capturing the functional behavior of clusters of thousands of cells simultaneously, rather than individual cells. While this technique does not achieve single-cell resolution, it allows for a comprehensive assessment of electrophysiological changes across the VCS region. Single cell electrophysiology is a good idea for future studies. 

      (5) Throughout the manuscript, the authors use "patterning" and "fate", which are applicable to development and differentiation, not to the situation where a gene is removed from fully differentiated cells in an adult organism resulting in a change of these cells. Perhaps more appropriate are "state" change and the requirement for "homeostasis/maintenance" of state.

      We appreciate the Reviewer’s concern regarding the terminology used to describe changes in VCS cell identity. To ensure precision and uniformity, we replaced terms such as “fate” and “patterning” with “state” or “maintenance” to reflect the shift in cellular characteristics in a fully differentiated adult tissue context. 

      Minor:

      (1) Please provide all data points in bar graphs.

      We have incorporated individual data points into the bar graphs as suggested, ensuring enhanced transparency and clarity in the data presentation.

      “(2) Formally, gene expression levels between samples are not normally distributed. The Welch t-test used here assumes a normal distribution. Therefore, nonparametric tests should be used.

      We appreciate Reviewer #1’s consideration of the appropriate statistical approach to the qPCR data and clarify our statistical approach here. Normality within each experimental group was assessed using the Shapiro-Wilk test. Between-group comparisons were conducted using Welch t-test, and multiple comparisons were corrected using the Benjamini & Hochberg method to control the false discovery rate (FDR) (71). If a significant difference was detected between two groups (t-test FDR < 0.05) but normality was rejected in any of the compared groups (Shapiro-Wilk P < 0.05), a non-parametric Wilcoxon rank-sum test was used for verification. A significant group-mean difference was confirmed at one-tailed Wilcoxon P≤0.05 (detailed in Supplementary Data Set I). Furthermore, we have updated the qRT-PCR information in each figure and their respective legends as follows. Statistical analysis was performed using R version 4.2.0. We have included a new Supplementary Data Set I, detailing the statistical analysis of qRT-PCR data. Additionally, we have revised the Methods/Statistics section to detail the applied statistical analysis. 

      (3) Some of the panels of figures are tiny and cannot be evaluated. For example, in Figure 1B the actual data (expression of Tbx3/5) is impossible to see.

      We appreciate the Reviewer’s observation and have revised the figures to improve visual clarity and ensure that the presented data are easily interpretable by readers.

      Reviewer #2 (Recommendations for the authors):

      Additional Experiments, Data, Analysis:

      (1) Comparisons between both single knockouts and double knockouts at the phenotypic level are needed. In some instances, the data is shown (e.g., mortality and EKG) but direct statistical comparison is not performed. In other instances (optical mapping and gene expression), data with single knockouts are not shown. If combined VCS Tbx3/Tbx5 deletion does not change the phenotype of the VCS Tbx5 single deletion, this should be explicitly stated and discussed.

      We appreciate Reviewer #2’s suggestion to compare the phenotypic outcomes of the Tbx3 and Tbx5 single conditional knockout models with those observed in Tbx3/Tbx5 double conditional knockout model. We have expanded the discussion section of our manuscript to incorporate a more detailed comparison between the double Tbx3/Tbx5 model and the single Tbx5 and Tbx3 models [1-5], highlighting the distinct phenotypic outcomes of the single and double knockouts.

      (1) Burnicka-Turek O, Broman MT, Steimle JD, Boukens BJ, Petrenko NB, Ikegami K, Nadadur RD, Qiao Y, Arnolds DE, Yang XH, Patel VV, Nobrega MA, Efimov IR, Moskowitz IP (2020) Transcriptional Patterning of the Ventricular Cardiac Conduction System. Circulation Research 127:e94-e106. doi:10.1161/CIRCRESAHA.118.314460. 

      (2) Mohan RA, Bosada FM, van Weerd JH, van Duijvenboden K, Wang J, Mommersteeg MTM, Hooijkaas IB, Wakker V, de Gier-de Vries C, Coronel R, Boink GJJ, Bakkers J, Barnett P, Boukens BJ, Christoffels VM (2020) T-box transcription factor 3 governs a transcriptional program for the function of the mouse atrioventricular conduction system. Proc Natl Acad Sci U S A. 117:18617-18626. doi: 10.1073/pnas.1919379117.

      (3) Arnolds DE, Liu F, Fahrenbach JP, Kim GH, Schillinger KJ, Smemo S, McNally EM, Nobrega MA, Patel VV, Moskowitz IP (2012) TBX5 drives Scn5a expression to regulate cardiac conduction system function. The Journal of Clinical Investigation 122:2509–2518. doi: 10.1172/JCI62617.

      (4) Frank DU, Carter KL, Thomas KR, Burr RM, Bakker ML, Coetzee WA, Tristani-Firouzi M, Bamshad MJ, Christoffels VM, Moon AM (2012) Lethal arrhythmias in Tbx3-deficient mice reveal extreme dosage sensitivity of cardiac conduction system function and homeostasis. Proc Natl Acad Sci U S A. 109:E154-63. doi: 10.1073/pnas.1115165109. [5] Moskowitz IP, Pizard A, Patel VV, Bruneau BG, Kim JB, Kupershmidt S, Roden D, Berul CI, Seidman CE, Seidman JG (2004) The T-Box transcription factor Tbx5 is required for the patterning and maturation of the murine cardiac conduction system. Development 131:4107-4116. doi: 10.1242/dev.01265.

      (2) Genome-wide expression analysis including working myocardium would provide stronger evidence for interconversion of cell states. Ideally, this would include single knockouts.

      We agree that a genome-wide expression analysis, including a direct comparison with working myocardium, would provide more comprehensive insights into cell state transitions in Tbx3:Tbx5-deficient VCS cells. Additionally, incorporating single knockout models into such analyses would further clarify the distinct and cooperative contributions of Tbx3 and Tbx5 to maintaining VCS identity. This is a good suggestion for future studies.

      (3) This may not be essential to support the authors' claims, but the addition of epigenetic data from single and double KO VCS using ATAC-seq (which can be performed with relatively small numbers of cells) could provide stronger evidence for cell state changes of the kind hypothesized by the authors.

      We agree that epigenetic data such as ATAC-seq would complement transcriptional analyses and provide insight into chromatin states that underlie the observed cellular reprogramming. This is a good suggestion for follow-up studies to further characterize the molecular state of Tbx3:Tbx5-deficient VCS cells.

      (4) Additional clarification of the optical mapping experiments to exclude alternative interpretations like focal right bundle branch block and to include single knockouts for comparison - if the Tbx5 single KO looks the same as the double KO that would be very important to know and would directly affect interpretation of the experiment.

      Right septal optical mapping preparation involved removing the right ventricular free wall to directly image the right ventricular septum, which contains the VCS. In a healthy mouse, there are two peak components of the optical action potential upstroke, the first peak due to the activation of the VCS and the second due to the activation of the ventricular cardiomyocytes. Importantly, in Tbx3:Tbx5 double-conditional knockout mice, the first peak was absent, rather than delayed, indicating loss of fast conduction through the VCS. This absence suggests a shift in VCS cells toward a ventricular working myocardial phenotype, rather than a regional conduction block or delayed propagation through a structurally intact VCS.

      Previous studies from our group have extensively characterized the effect of single Tbx5 knockout on the VCS in murine hearts [1, 2, 3]. Arnolds et al. demonstrated that VCSspecific Tbx5-deficiency results in significant slowing of VCS conduction, evidenced by prolonged PR and QRS intervals, along with lengthening of the atrio-Hisian interval, His duration, and Hisioventricular interval [1]. Although both single Tbx5 knockout and Tbx3:Tbx5 double knockout mice exhibit slowing of ventricular conduction system, our optical mapping studies reveal distinct differences in their electrophysiological phenotypes. Burnicka-Turek et al. showed that the single knockout of Tbx5 in the VCS leads to a shift toward a pacemaker cell state, evidenced by ectopic beats originating in the ventricles and inappropriate automaticity [3]. During spontaneous beats, electrical impulses were retrogradely activated, propagating from the ventricles to the atria [3]. Whole-cell patch clamping recordings confirmed that Tbx5-deficient VCS cells displayed action potentials resembling pacemaker cells, characterized by slower upstroke (phase 0), prolonged plateau (phase 2), delayed repolarization (phase 3), and enhanced phase 4 depolarization [3]. In contrast, our current study on VCS-specific Tbx3:Tbx5 double knockout demonstrates a loss of the VCS-specific fast conduction propagation. Optical mapping demonstrated the absence of the initial upstroke corresponding to VCS activation in the His bundle region, indicating a shift in the VCS cells toward a ventricular working myocardium state. This loss of fast conduction properties highlights a fundamental distinction between single and double knockouts, suggesting that both Tbx3 and Tbx5 are required to maintain VCS identity and function.

      (1) D. E. Arnolds et al., “TBX5 drives Scn5a expression to regulate cardiac conduction system function,” J. Clin. Invest., vol. 122, no. 7, pp. 2509–2518, Jul. 2012, doi: 10.1172/JCI62617.

      (2) Moskowitz, I.P., Pizard, A., Patel, V.V., Bruneau, B.G., Kim, J.B., Kupershmidt, S., Roden, D., Berul, C.I., Seidman, C.E., Seidman, J.G. (2004) The T-Box transcription factor Tbx5 is required for the patterning and maturation of the murine cardiac conduction system. Development 131(16):4107-4116. 

      (3) Burnicka-Turek, O., Broman, M.T., Steimle, J.D., Boukens, B.J., Peterenko, N.B, Ikegami, K., Nadadur, R.D., Qiao, Y., Arnolds, D.E., Yang, X.H., Patel, V.V., Nobrega, M.A., Efimov, I.R., Moskowitz, I.P. (2020) Transcriptional Patterning of the Ventricular Cardiac Conduction System. Circ Res. 127(3):e94-e106. 

      Methods:

      (1) Additional methods on FACS are required. The methods section references a paper from 2004 (reference 67) that describes the flow sorting of embryonic cardiomyocytes. However, flow cytometric isolation of intact adult cardiomyocytes, which the authors describe in the present work, is a distinct technique and generally requires special equipment. These need to be described in more detail to be fully replicable.

      We thank Reviewer #2 for highlighting the need to provide additional details regarding our flow cytometric isolation of adult VCS cardiomyocytes. While we referenced earlier methods, we agree that isolating adult cardiomyocytes requires specialized approaches. Therefore, we revised the Methods section to include a detailed description of the equipment, procedures, and adaptations specific to isolating intact adult VCS cells to ensure full replicability.

      Minor Corrections:

      (1) Figure 1D. Please add a statistical test for mortality between the double conditional KO and the Tbx5 conditional KO.

      We have revised Figure 1D to include the statistical test comparing mortality between the Tbx3:Tbx5 double conditional knockout and the Tbx5 conditional knockout cohorts.

      (2) Figure 2A, 2I, 3A: Please include all individual data points not just a bar graph with error bars.

      We have added all individual data points to the bar graphs as recommended, enhancing the transparency and clarity of the data presentation.

      (3) Figure 2A: Please consider separate graphs for PR and QRS with appropriately scaled Y-axis so differences are easier to see.

      We appreciate Reviewer #2’s suggestion and fully agree with it. As a result, we have revised Figure 2A to include separate graphs for PR and QRS intervals, each with appropriately scaled Y-axes. This adjustment enhanced both the readability and the clarity of the observed differences.

      (4) Figure 3 G-K: The figure would be easier to interpret for the reader if genotypes were shown in the figure not just in the legend.

      We agree with Reviewer #2’s suggestion and have revised Figure 3 accordingly by adding genotype labels directly to the histological sections in Panels G-K. This update improves clarity, making the data easier for readers to interpret without needing to refer to the figure legend.

      (5) Figure 4A, C: Are vertical axes mislabeled? They say, "CON VCS and TBX5OE VCS". Please double-check axis labels and data on the graph.

      We appreciate the Reviewer bringing the mislabeling of the vertical axis in Figure 4 to our attention. We have corrected the labeling errors and ensured consistency between the graph and the underlying data.

      (6) Legend to Supplementary Figure 6. Says "Tbx3:Tbx3" instead of "Tbx3:Tbx5".

      We thank Reviewer #2 for pointing out the typo. It has been corrected to: “Supplementary Figure 6. Tbx3:Tbx5 double-conditional knockout mice exhibit QRS prolongation”.

      (7) Discussion. The authors write, "In Tbx3:Tbx5 double VCS knockout, we observed repression of fast VCS markers and also repression of Pan-CCS markers transcribed throughout the entire CCS." The term 'repression' has a specific connotation with transcription regulators that is likely not intended in this context so perhaps 'reduced expression' would be better here?

      We agree with Reviewer #2 and have replaced “repression” with “reduced expression” throughout the text (look below for references).

      “In the Tbx3:Tbx5 double VCS knockout, we observed a reduction in the expression of both fast VCS markers and Pan-CCS markers transcribed throughout the entire CCS.”

      (8) Discussion, the authors write, "This study combined with prior literature (1, 7, 11, 15, 26, 53, 54) indicates that the presence of both Tbx3 and Tbx5 is necessary for the specification of the adult VCS (Figure 7)." Since this work presents data from an adult conditional deletion, it's not clear how it informs our understanding of the specification, which occurs during development. Perhaps "maintenance of VCS fate" would be more appropriate here?

      We agree with Reviewer #2 that the term “maintenance of VCS fate” is more appropriate in the context of our study. Accordingly, we have updated the text to reflect this terminology.

      Reviewer #3 (Recommendations for the authors):

      (1) Figure 2B: It is hard to see the IF images. What is the cardiac structure studied? Maybe a dashed line and a label to define the region and the structure represented will help. As the authors have described that the crosses used contain a reporter allele (R26-EYFP), a clearer way to show these results would be to include images of the linage traced cells with the reporter, not only to identify the CCS structure analyzed, but also to demonstrate that the deletion is specific to the MinK-creERT expression in the CCS.

      We appreciate the Reviewer’s suggestion to improve the clarity of Figure 2B by delineating the cardiac structures analyzed. In response, we have added dashed lines and labels to highlight the regions of interest within the IF images. Unfortunately, we were unable to capture high-quality EYFP fluorescence images for these sections. However, to address this concern, we microdissected the region shown in the IF images and performed FACS to isolate EYFP-positive cells from this specific area. These sorted cells were subsequently used for qPCR analysis, which confirmed the presence of Tbx3 and Tbx5 in control samples and the successful deletion of both genes in the doubleconditional knockout samples (Figure 2C, middle panel). We believe this approach provides robust evidence for the specificity of the MinK-CreERT expression in the CCS and the efficiency of gene deletion in the targeted region.

      (2) 3G-K: The authors describe the absence of morphological defects in the tissue sections of adult hearts from the different genotypes analyzed. Although this reviewer agrees that there seem to be no major defects in the general cardiac morphology of these animals, the higher magnification images suggest some tissue differences at the level of the AVN especially in the double HET, double HOMO, and the Tbx3 HOMO. Is that due to the section plane used? If so, more appropriate and comparable sections must be provided. Again, as the crosses used by the authors contain a reporter allele (R26-EYFP), it is required that the authors show that the CCS cells, where deletions are induced, are still present in equivalent areas in the mutants and that they remain in similar numbers only failing to maintain their specification into CCS due to Tbx3 and Tbx5 loss of function.

      This analysis will reinforce the authors' claims on the role of Tbx5/Tbx3 in this process.

      We thank the reviewer for their thorough assessment and thoughtful feedback on our histological analysis. The higher magnification images in Figure 3G-K do not specifically present the AVN. These sections primarily represent areas of the ventricular conduction system (VCS), particularly the His bundle and bundle branches, rather than the AVN itself. We do not believe that the observed morphological differences are related to AVN tissue, and there were no functional deficits attributable to the AVN in the double knockout. Furthermore, the Mink-Cre allele used in this study does not recombine in the ANV proper.   We agree that confirming the presence of CCS cells in equivalent regions across different genotypes is crucial. Our approach using FACS-based isolation of EYFP-positive cells from the VCS, followed by qPCR analysis, provides evidence that these cells remain present in double conditional knockouts, although they fail to maintain their specialized gene expression profile. This reinforces our conclusion that Tbx3 and Tbx5 are essential for maintaining the molecular identity of CCS cells, rather than their physical presence.

      (3) Figure 4: The authors performed molecular analysis by qPCR and WB in Tbx5/Tbx3 double mutants to demonstrate that CCS cells lose the expression of CCS genes and express working myocardium genes. Could this be further demonstrated by ISH, HCR, or IF together with lineage tracing to provide evidence that these changes are located where the CCS tissues are in the control embryos? Analysis of 2 or 3 of these markers of each type on tissue sections would be enough.

      We thank the Reviewer for their insightful suggestion regarding additional validation of our molecular findings through ISH, HCR, or IF combined with lineage tracing. However, we would like to clarify that the molecular analyses we performed by qPCR and WB were conducted on EYFP-positive cells that were specifically isolated from the ventricular conduction system (VCS) region of both control and double conditional knockout (dCKO) mice. These EYFP-positive cells were obtained through fluorescence-activated cell sorting (FACS), ensuring that our analyses were confined to the targeted VCS population. Alternate approaches are appropriate for future studies to investigate the precise genomic and molecular nature of the transformation observed in the double knockout.

      (4) Discussion: in the discussion section the authors conclude that the combined role of Tbx5/Tbx3 is critical for the specification of the adult VCS. However, as the Tbx5/Tbx3 loss of function conditions are only induced in adult animals 6 weeks old, would it be more appropriate that their function is the maintenance of the VCS cell fate and that if not present these cells return to the working myocardium fate? If the authors believe that these genes are involved in the induction of VCS specification in adults, then they need to demonstrate that, before the loss of function induction at 6 weeks, these cells are not yet specified as adult VCS.

      We appreciate the Reviewer’s clarification regarding terminology. We agree that our study focuses on adult-specific conditional deletion and thus reflects the maintenance, rather than the specification, of VCS cell fate. Accordingly, we have revised the text to explicitly state that Tbx3 and Tbx5 are critical for maintaining VCS identity in adult mice, and that their loss leads to a shift toward a working myocardial fate.

      Minor:

      (1) There is no consistency in the way the quantitative data is shown in graphs. There are some graphs showing only bars, other dot plots, and other a combination of both. The authors must homogenise the representation of quantitative data showing the different data points in dot plots and not in bar graphs.

      We have standardized the quantitative data presentation across all figures, by including individual data points in bar graphs, ensuring enhanced transparency and clarity.

      (2) Figure 3: The labels defining the genotypes corresponding to the different histological sections of adult hearts (Panels G-K) are missing. Panels J and K are not referenced in the text.

      We thank Reviewer #3 for highlighting these omissions. We have added the genotype labels to the histological sections in Panels G-K of Figure 3 to ensure clarity. Furthermore, we have now referenced Panels J and K in the results and in the supplementary material (please look below for references).

      “Histological examination of all four-chambers demonstrated no discernible differences between VCS-specific Tbx3:Tbx5 double-knockout (Tbx3<sup>fl/fl</sup>;Tbx5<sup>fl/fl</sup>;R26<sup>EYFP/+</sup>; MinK<sup>CreERT2/+</sup>) and control (Tbx3<sup>+/+</sup>;Tbx5<sup>+/+</sup>;R26<sup>EYFP/+</sup>; MinK<sup>CreERT2/+</sup>) mice, nor between . the double-knockout (Tbx3<sup>fl/fl</sup>;Tbx5<sup>fl/fl</sup>;R26<sup>EYFP/+</sup>; MinK<sup>CreERT2/+</sup>) and single-knockout models for either Tbx3 (Tbx3<sup>fl/fl</sup>;Tbx5<sup>+/+</sup>;R26<sup>EYFP/+</sup>; MinK<sup>CreERT2/+</sup>) or Tbx5 (Tbx3<sup>+/+</sup>;Tbx5<sup>fl/fl</sup>;R26<sup>EYFP/+</sup>; MinK<sup>CreERT2/+</sup>).Ventricular muscle appeared normal without hypertrophy or myofibrillar disarray and no fibrosis was present (Figure 3G, 3I, 3J, and 3K, respectively).”

      “Additionally, we confirmed the absence of histological and structural abnormalities in these mice, aligning with previous findings (Figures 3A, 3F versus 3B, and 3K versus 3G, respectively)(1, 11).”

      (3) Typo: Supplementary Figure 6. Tbx3:Tbx3 double-conditional knockout: it should say Tbx5:Tbx3 double-conditional knockout.

      We thank Reviewer #3 for pointing out the typo. It has been corrected to: “Supplementary Figure 6. Tbx3:Tbx5 double-conditional knockout mice exhibit QRS prolongation”.

    1. Author response:

      General Statements

      We sincerely appreciate the constructive comments from the reviewers, which have significantly enhanced the clarity and rigor of our manuscript. Most of their suggestions have already been incorporated into the revised version. Additionally, we are conducting an additional experiment to further substantiate our conclusions, and preliminary data seem to support our findings.

      As pointed out by Reviewer #1, the regulation of neural circuit function by oligodendrocytes is currently a highly significant and actively studied topic. Our study demonstrates that regional heterogeneity in oligodendrocytes underlies the microsecond-level computational processes in the sound localization circuit. We believe this work represents a substantial contribution to the field.

      Description of the planned revisions

      • Evaluation of node formation along axons sparsely expressing eTeNT (related to Reviewer #2: comment 1)

      Based on the approximately 90% expression efficiency of A3V-eTeNT in NM neurons, we interpreted that vesicular release from NM axons was largely inhibited in the NL region, leading to the suppression of oligodendrogenesis and the subsequent emergence of unmyelinated segments. However, the effects of eTeNT on myelination are likely diverse, and a possibility remains that eTeNT directly disrupted axon-oligodendrocyte interactions, preventing oligodendrocytes from myelinating the axons expressing eTeNT.

      To test this possibility, we have initiated an additional experiment to evaluate formation of nodes along axons, while expressing eTeNT sparsely by electroporation. Preliminary results indicated that unmyelinated segments did not increase, supporting our original conclusion. After completion of the experiment, we will include the findings as a Supplementary Figure associated with Figure 6, which will provide a clearer understanding of how eTeNT influences myelination.

      Description of the revisions that have already been incorporated in the transferred manuscript

      • Revised terminology from "nodal distribution" to "nodal spacing" throughout the manuscript. (Reviewer #1: comment 1)

      • Emphasized that our analyses were focused on the main trunk of NM axons (Reviewer #1: comment 2) We explicitly stated throughout the manuscript that we analyzed the main trunk of NM axons and made it clear that our findings do not contradict those by Seidl et al. (J Neurosci 2010), showing the similar axon diameter between midline and ventral NL regions (page 7, line 7).

      • Added an explanation on the maturation of sound localization circuit (Reviewer #1: comment 3) We explained that chickens have high ability of sound localization at hatch, emphasizing that the sound localization circuit is almost fully developed by E21 (page 4, line 12).

      • Emphasized the diverse effects of neuronal activity on oligodendrocytes (page 10, line 18) (Reviewer #1: comment 4)

      • Added details on the efficiency of A3V-eTeNT expression in NM neurons to the Results section (page 8, line 5) (Reviewer #2: comment 1)  

      • Made it clear in Figure Legend for Figure 6D that the analysis was conducted under the condition, where most of the axons were labeled by A3V-eTeNT (page 31, line 9) (Reviewer #2: comment 2)

      • Clarified the rationale for statistical test selection (Reviewer #2: comment 3.1)

      • Reanalyzed all statistical data with appropriate methods using R (Reviewer #2: comment 3.2)

      • Clearly indicated which statistical tests were used in each figure (Reviewer #2: comment 3.3)

      • Clarified what n represents and N used in each experiment (Reviewer #2: comment 3.4)

      • Added individual data points to bar graphs in Figure  5 and 6 (Reviewer #2: comment 3.5)

      • Emphasized the importance of comparing the ITD circuit with that of rodents (page 11, line 32) (Reviewer #2: comment 4) 

      • Softened the expressions related to "determine" (Reviewer #2: comment 5)

      Our study demonstrates that regional differences in the intrinsic properties of oligodendrocytes are the prominent determinant of nodal spacing patterns. However, we acknowledge that this does not establish a direct causation. Accordingly, relevant expressions have been revised throughout the manuscript.

      • Added references (Reviewer #2: comment 6)

      • Corrected units in Figure 1G (Reviewer #2: comment 7)

      • Added discussion about the involvement of pre-nodal clusters in the regional differences in nodal spacing (page 9, line 35) (Reviewer #3: comment 1).

      Related to this issue, we have added new data to Figure 6I.

      • Discussed the possibility that the developmental origin and/or the pericellular microenvironment of OPCs contributed to the regional heterogeneity of oligodendrocytes (page 9, line 21) (Reviewer #3: comment 3).

      • Added references used in the response to reviewers into the main text.

      • Corrected the data error in Figure 6G, H

      • Corrected the dataset in Figure 3E

      We limited the data in Figure 3E–G to those measuring both myelin length and diameter simultaneously.

      Description of analyses that authors prefer not to carry out

      • Analysis in adult chickens (Reviewer #1: comment 3,4)

      The chick brainstem auditory circuit is nearly fully developed by E21, and we have also demonstrated that nodal spacing increases by approximately 20% while maintaining regional differences up to P9. Therefore, our study covers the period from pre-myelination to postfunctional maturation, and we think that the necessity of analyzing aged animals is small.

      • Functional evaluation of the efficiency of eTeNT suppression (Reviewer #2: comment 1)

      It is technically challenging to quantitatively assess the inhibition of vesicular release by eTeNT in NM axons given that multiple synapses from different NM axons converge onto postsynaptic neurons. In addition, previous studies have already validated the efficacy of this construct in multiple species. Therefore, we will not evaluate electrophysiologically the extent of vesicular release inhibition by eTeNT in this study. Instead, we have provided clear evidence that A3V-eTeNT is expressed efficiently and leads to notable phenotypic changes, such as the inhibition of oligodendrogenesis. (page 8, line 5).

      • Replacing figures with data averaged per animal (Reviewer #2: comment 3.4)

      Our study focuses on the distribution of morphological characteristics at the single-cell level rather than solely on group means. Averaging measurements per animal could obscure this cellular heterogeneity and potentially misrepresent our findings. Given that data distributions in our plots show clear distinctions, we believe that averaging per biological replicate is not essential in this case. If requested, we will be happy to provide the outputs of PlotsOfDifferences as supplementary source data files, similar to those used in eLife publications, for each figure.

      • Additional experiments to manipulate oligodendrocyte density (Reviewer #2: comment 5)

      We have already demonstrated that A3V-eTeNT reduces oligodendrocyte density in the NL region, and some of the arguments in our study are based on this result. Therefore, we think that further experiments are not necessary.

      • Verification of the presence of pre-nodal clusters (Reviewer #3: comment 1)

      We investigated the presence of pre-nodal clusters on NM axons, but we could not identify them in the immunohistochemistry of AnkG. As the occurrence of pre-nodal clusters varies depending on neuronal type, we consider that pre-nodal clusters are not prominent in the NM axons and that further experimental validation would not be necessary. Instead, we have added a discussion on the possibility that pre-nodal clusters contribute to regional differences in nodal spacing along NM axons (page 9, line 35).

      • Axon diameter measurements using EM (Reviewer #3: comment 2)

      This experiment was already done by Seidl et al. (2010), and hence, we do not think it necessary to repeat it. We believe that the relative differences in axon diameter between the regions could be adequately assessed using the optical approach with membrane-targeted GFP.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review): 

      Despite evidence suggesting the benefits of neutralizing mucosa-derived IgA in the upper airway in protection against the SARS-CoV-2 virus, all currently approved vaccines are administered intramuscularly, which mainly induces systemic IgG. Waki et al. aimed to characterize the benefits of intranasal vaccination at the molecular level by isolating B cell clones from nasal tissue. The authors found that Spike-specific plasma cells isolated from the spleen of vaccinated mice showed significant clonal overlap with Spikespecific plasma cells isolated from nasal tissue. Interestingly, they could not detect any spike-specific plasma cells in the bone marrow or Peyer's patches, indicating that these nose-derived cells did not necessarily home to and reside in these locations, although the Peyer's patch is not a typical plasma cell niche - rather the lamina propria of the gut would have been a better place to look. Furthermore, they found that multimerization improves the antibody/antigen binding when the antibody is of low or intermediate affinity, but that high-affinity monomeric antibodies do not benefit from multimerization. Lastly, the authors used a competitive ELISA assay to show that multimerization could improve the neutralizing capacity of these

      antibodies. 

      The strength of this paper is the cloning of multiple IgA from the nasal mucosae (n=99) and the periphery (n=114) post-SARS-CoV-2 i.n. vaccination to examine the clonal relationship of this IgA with other sites, including the spleen. This analysis provides novel insights into the nature of the mucosal antibody response at the site where the host would encounter the virus, and whether this IgA response disseminates to other

      tissues. 

      There were also some weaknesses: 

      (1) The finding that multimerization improves binding and neutralization is not surprising as this was observed before by Wang and Nussenzweig for anti-SARS-CoV-2 IgA (authors should cite Enhanced SARS-CoV-2 neutralization by dimeric IgA. Wang et al., Sci. Transl. Med 2021, 13:3abf1555). 

      We have cited the paper, and the relevant sentence has been modified as follows (line 51-53); Recent studies have demonstrated that multimeric IgA is more effective and provides greater cross-protection than IgG and M-IgA (Okuya et al., 2020b) (Asahi et al., 2002) (Dhakal et al., 2018) (Asahi-Ozaki et al., 2004) (Wang et al., 2021).

      In addition, as far as I can tell we cannot ascertain the purity of fractions from the size exclusion chromatography thus I wasn't sure whether the input material used in Fig. 4 was a mixed population of dimer/trimer/tetramer?  

      The S-IgAs used in the SPR analysis in Fig. 4 consist of a mixture of dimers, trimers, and tetramers. The observed values indicate the average affinity of the S-IgAs. Please refer to the revised version (line 278280).

      (2) The flow cytometric assessment of the IgA+ clones from the nasal mucosae was difficult to interpret (Fig. 1B). It was hard for me to tell what they were gating on and subsequently analyzing without an IgA-negative population for reference. 

      We have updated FACS plots to illustrate the presence of IgA+ plasma cells in Fig. 1B, and the detailed gating strategy is outlined in Fig. 1B legend. Please find the relevant statements (line 115-120).

      (3) While the i.n. study itself is large and challenging, it would have been interesting to compare an i.m. route and examine the breadth of SARS-CoV-2 variant S1 binding for IgGs as in Fig. 2A. Are the IgA responses derived from the mucosae of greater breadth than systemic IgG responses? Alternatively, and easier, authors could do some comparisons with well-characterized IgG mAb for affinity and cross-reactivity as a benchmark to compare with the IgAs they looked at. Overall the authors did a good job of looking at a large range of systemic vs mucosal S1-specific antibodies in the context of an intra-nasal vaccination and this provides additional evidence for the utility of mucosal vaccination approaches for reducing person-to-person transmission. 

      I appreciate your consideration. Recent reports indicate that some M-IgA monomers possess neutralizing activity that is equivalent to or less than that of IgGs. However, the opposite phenomenon has also been observed. These results suggest that the Fc does not merely correlate with the degree of increase in antibody reactivity or functionality. We believe the discrepancies in previous studies are due to variations in the binding modes between the epitope and paratope of each antibody clone. Nevertheless, oligomerization enhances the functionality of most monomeric antibody clones, suggesting that the multivalent S-IgA enables a mode of action that is challenging to achieve with a monomeric antibody. Please refer to the revised version (line 399-403).

      Alternatively, and easier, authors could do some comparisons with well-characterized IgG mAb for affinity and cross-reactivity as a benchmark to compare with the IgAs they looked at. Overall the authors did a good job of looking at a large range of systemic vs mucosal S1-specific antibodies in the context of an intra-nasal vaccination and this provides additional evidence for the utility of mucosal vaccination approaches for reducing person-to-person transmission. 

      We have summarized the characteristics of the four types of nasal IgAs in Fig.7 and in the Discussion. Please refer to the revised version (line 405-422).

      Reviewer #2 (Public Review): 

      Summary: 

      This research demonstrates the breadth of IgA response as determined by isolating individual antigenspecific B cells and generating mAbs in mice following intranasal immunization of mice with SARS-CoV2 Spike protein. The findings show that some IgA mAb can neutralize the virus, but many do not. Notable immunization with Wuhan S protein generates a weak response to the omicron variant. 

      Strengths: 

      Detailed analysis characterizing individual B cells with the generation of mAbs demonstrates the response's breadth and diversity of IgA responses and the ability to generate systemic immune responses. 

      Weaknesses: 

      The data presentation needs clarity, and results show mAb ability to inhibit SARS-CoV2 in vitro. How IgA functions in vivo is uncertain. 

      We conducted an additional experiment using a hamster model and confirmed that S-IgAs can protect against SARS-CoV-2 infection. Please refer to the revised version (line 349-373 and 431-438).

      Reviewer #1 (Recommendations For The Authors): 

      (1) Figure 1A shows antibody titers in nasal lavage fluid and serum of mice post intranasal vaccination with SARS-CoV-2 Spike protein. The Y-axis of this figure is labeled as "U/mg" however these units are not clearly defined. 

      The antibody titers are expressed as optical density (OD450) value per total protein in nasal lavage fluids or serum. Please find the relevant statements (line 113-114).

      Furthermore, what do antibody titers in the nasal lavage fluid and serum look like post-intramuscular vaccination with the same vaccine and dose? Comparison of titers to the intramuscular route as well as to the PBS control would make this data more impactful. 

      We appreciate your consideration. We have not conducted experiments comparing the effects of intramuscular and intranasal administration using the same dosage and adjuvant. Cholera toxin has primarily been used as an adjuvant for nasal immunization, but it is seldom applied for intramuscular injection. We are interested in its impact on the immune compartment when using cholera toxin as an adjuvant for intramuscular injection. We plan to conduct further experiments in the future.

      Lastly, in Figure 1B, the detection of nasal IgG is not shown even though the authors assess nasally-derived IgG in the spleen further into the study.  

      Since the number of lymphocytes that can be collected from the nasal mucosa is limited, there is an insufficient capacity to isolate IgG+ plasma cells after collecting IgA+ plasma cells. Therefore, conducting such an experiment on mice is technically challenging. A larger animal, such as rats, will be necessary to perform this experiment. Further investigation is needed to determine whether antigen-specific IgG+ plasma cells, sharing V-(D)-J with nasal IgA, can be detected in the nasal mucosa.

      (2) There appears to be something amiss with the IgA stain. It is smushed up against the X-axis. Better flow cytometry profiles should be shown. Likewise in Supplemental Fig. 1A, their IgA stain appears to not be working. This must be addressed using positive and negative controls. 

      We have updated FACS-polts to show the IgA+ plasma cell in Fig.1B, and the detailed gating strategy is outlined in the Fig.1B legend. Please find the relevant statements on line 115-120.

      (3) We do not know the purity of the samples that were subjected to SPR and since the legend of Fig. 4 is partially incorrect, it was difficult to know how this experiment was done. 

      The S-IgA used in the SPR analysis shown in Figure 4 is a mixture of dimers, trimers, and tetramers, and the observed values are believed to reflect the affinity of the S-IgA in the nasal mucosa. Please refer to the revised version (line 278-280).

      (4) Fig. 5 results need to compare with some of the well-characterized mAb (IgG) to understand the biological significance of these neutralizing titres. 

      We have summarized the characteristics of the four types of nasal IgA in Fig.7 and in the Discussion. Please refer to the revised version (page 405-422).

      Communication of results: 

      (1) Authors could improve the communication of their results by introducing the vaccination protocol in the results section accompanied by a diagram of the vaccination strategy (nature of the Ag, route, and frequency). This could be Fig. 1A .  

      A schematic diagram of the vaccination protocol is presented in Fig.1.

      (2) Care should be taken with some of the terminology. Intranasal is the accepted term but authors sometimes use "internasal". The term "immunosuppression" on page 2 could be misleading as it means something different to other audiences. The distinction when speaking about "protection from harmful pathogens" should be made between protection against infection (ie sterilizing immunity) vs protection against disease (ie morbidity and mortality). Instead of "nose", one should say "nasal". Nose-related could be rephrased as "potentially nasal-derived". P.5, line 2 didn't make sense: "IgG+ plasma cells that express nose-related IgA"...

      In many places, Spike is missing it's "e".  

      We have made the correction accordingly.

      (3) Page 3: The lumping of the human and animal SARS-CoV-2 intranasal studies together is a bit misleading. Very little has worked for intranasal vaccination against SARS-CoV-2 in humans at this point in time (although hopefully that will change soon!). Authors should specify which studies were done in animals and which were done in humans. 

      The manuscript has been revised to include two citations on line 73-75 (Ewer et al., 2021 and Zhu et al., 2023).

      (4) What is ER-tracker? It comes out of nowhere and should be explained why it was used to the reader (as well as why they used the other markers) to sort for Spike-specific PC. 

      ER-Tracker is a fluorescent dye that is highly selective for the endoplasmic reticulum of living cells. Because plasma cells have an expanded endoplasmic reticulum for properly folding and secreting large quantities of antibodies, using ER-Tracker along with anti-CD138 facilitates the isolation of plasma cells from lymphocytes without the need for additional antibodies. Please refer to the revised version for details. (ine 130-134).

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:  

      Reviewer #1 (Public review):  

      Summary:  

      Goal: Find downstream targets of cmk-1 phosphorylation, identify one that also seems to act in thermosensory habituation, test for genetic interactions between cmk-1 and this gene, and assess where these genes are acting in the thermosensory circuit during thermosensory habituation.  

      Methods: Two in vitro analyses of cmk-1 phosphorylation of C. elegans proteins. Thermosensory habituation of cmk-1 and tax-6 mutants and double mutants was assessed by measuring the rate of heat-evoked reversals (reversal probability) of C. elegans before and after 20s ISI repeated heat pulses over 60 minutes.  

      Conclusions: cmk-1 and tax-6 act in separate habituation processes, primarily in AFD, that interact complexly, but both serve to habituate the thermosensory reversal response. They found that cmk-1 primarily acts in AFD and tax-6 primarily acts in RIM (and FLP for naïve responses). They also identified hundreds of potential cmk-1 phosphorylation substrates in vitro.  

      Strengths:  

      The effect size in the genetic data is quite strong and a large number of genetic interaction experiments between cmk-1 and tax-1 demonstrate a complex interaction.  

      Thanks a lot for these positive remarks.

      Weaknesses:  

      The major concern about this manuscript is the assumption that the process they are observing is habituation. The two previously cited papers using this (or a very similar) protocol, Lia and Glauser 2020 and Jordan and Glauser 2023, both use the word 'adaptation' to describe the observed behavioral decrement. Jordan and Glauser 2023 use the words 'habituation' or 'habituation-like' 10 times, however, they use 'adaptation' over 100 times. It is critical to distinguish habituation from sensory adaptation (or fatigue) in this thermal reversal protocol. These processes are often confused/conflated, however, they are very different; sensory adaptation is a process that decreases how much the nervous system is activated by a repeated stimulus, therefore it can even occur outside of the nervous system. Habituation is a learning process where the nervous system responds less to a repeated stimulus, despite (at least part of the nervous system) the nervous system still being similarly activated by the stimulus. Habituation is considered an attentional process, while adaptation is due to the fatigue of sensory transduction machinery. Control experiments such as tests for dishabituation (where the application of a different stimulus causes recovery of the decremented response) or rate of spontaneous recovery (more rapid recovery after short inter-stimulus intervals) are required to determine if habituation or sensory adaptation are occurring. These experiments will allow the results to be interpreted with clarity, without them, it isn't actually clear what biological process is actually being studied.  

      Thanks for the comment. As this reviewer points out, “adaptation” and “habituation” are often conflated. Many scientists (maybe not the majority though) use a less stringent definition for the word habituation, than the one presented by this reviewer. More particularly, the term habituation is used in human pain research to refer solely to the reduction of response to repeated stimuli, in the absence of a detailed assessment of the more stringent criteria mentioned here (see, e.g.,  PMID: 22337205 ; PMID: 18947923 ; PMID: 17258858; PMID: 20685171 ; PMID: 15978487). In addition to the practice in pain research, the main reason why we steered toward ‘habituation’ from our previous publication is because it immediately conveys the idea of a response reduction, whereas ‘adaptation’ could in principle be either an up-regulation or a downregulation of the response (again, based on various definitions). But we agree that using the word “habituation” came at the cost of triggering a confusion about the exact nature of the process, for those considering the stricter definition of the word “habituation” and those not in the narrower field of pain research. In the revised manuscript, we have thus changed this terminology to “adaptation”. Also following suggestions from Reviewer 2, we have strengthened the description of the protocol in the Result section and clarified, why the adaptation phenomenon is not a ‘thermal damage’ effect or ‘fatigue’ effect in the neuro-muscular circuit controlling reversal. One of the most convincing piece of evidence it cannot be solely explained by “damages” or “exhaustion” is simply the existence of non-adapting mutants (like cmk-1(lf)) or pharmacological treatments (Cyclosporin A) blocking the adaptation effect and enabling worm to continuously reverse for hours without any problems.  

      While the discrepancy between the in vitro phosphorylation experiments and the in silico predictions was discussed, the substantial discrepancy (over 85% of the substrates in the smaller in vitro dataset were not identified in the larger dataset) between the two different in vitro datasets was not discussed. This is surprising, as these approaches were quite similar, and it may indicate a measure of unreliability in the in vitro datasets (or high false negative rates).

      Thanks for the comment. This is an important aspect which we now more extensively cover in the Discussion section.

      The strong consistency of the CMK-1 recognition consensus sequences across the two in vitro dataset speaks against the unreliability of the analyses. Instead, there are a few points to highlight that explain the somewhat low degree of overlap between the two datasets, which indeed relate to the false negative rates as this reviewer suggests.

      (1) In the peptide library analysis, Trypsin cleavage prior to kinase treatment will leave a charged N-term or C- terminus and in addition remove part of the protein context required for efficient kinase recognition. This will have a variable effect across the different substrates in the peptide library, depending on the distance between the cleavage site and the phosphosite, but will not affect the native protein library. This effect increases the false negative rate in the peptide library.

      (2) The number and distribution of “available substrate phosphosites” diverge in the two libraries. Indeed, the peptide library is expected to contain a markedly larger diversity of potential CMK-1 substrate sites than the protein library (because the Trypsin digestion will reveal substrates that are normally buried in a native protein), but the depth of MS analysis is the same for the two libraries. In somewhat simplistic terms, the peptide-library analysis is prone to be saturated with abundant phosphorylated peptides, which prevent detecting all phosphosites. If the peptide analysis could have been made deeper, we would probably have increased the overlap (at the cost of increasing the number of false positive too).

      (3) We have chosen quite strict criteria and applied them separately to define each hit list; therefore, we know we have many false negatives in each list, which will naturally reduce the expected overlap.

      We now extended the discussion of the limited overlap of the two dataset in a dedicated paragraph in the discussion. We also clarify that we tend to give more trust to the protein-library dataset (since substrates are in a configuration closer to that in vivo), with those hits also present in the peptide dataset (like TAX-6 was) as the most convincing hits, as they could be validated in a second type of experiment.

      Additionally, the rationale for, and distinction between, the two separate in vitro experiments is not made clear.  

      We reasoned that both substrate types have their own benefits and limitations (as discussed in the manuscript), so it was an added value to run both. We proposed that the subset of targets present in both datasets to be the most solid list of candidates. We have reinforced this point in the discussion.  

      Line 207: After reporting that both tax-6 and cnb-1 mutants have high spontaneous reversals, it is not made clear why cnb-1 is not further explored in the paper. Additionally, this spontaneous reversal data should be in a supplementary figure.  

      We kept the focus of the article primarily on TAX-6, because it was identified as CMK-1 target in vitro; CNB-1 was not. Moreover, we didn’t have cnb-1(gf) mutants to pursue the analysis with, and we were stuck by the cnb-1(lf) constitutive high reversal rate for any further follow up. We have added a supplementary file to present the spontaneous reversals rates.

      Figure 3 -S1: This model doesn't explain why the cmk-1(gf) group and the cmk-1(gf) +cyclo A group cause enhanced response decrement (presumably by reducing the inhibition by tax-6) but the +cyclo A group (inhibited tax-6) showed weaker response decrement, as here there is even further weakened inhibition of tax-6 on this process. Also, the cmk-1(lf) +cyclo A group is labeled as constitutive habituation, however, this doesn't appear to be the case in Figure 3 (seems like a similar initial level and response decrement phenotype to wildtype).  

      Thanks a lot for the comment. We are glad that the presentation of our complex dataset was clear enough to bring the reader to that level of detailed reflection and interpretation on the proposed model. To address the two points raised in this reviewer comment, we made modifications to the model presentation and provide additional clarifications below, where we use the term adaptation instead of habituation (as in the revised Figure):

      Regarding the first point, “why the cmk-1(gf) group and the cmk-1(gf) +cyclo A group cause enhanced response decrement … but the +cyclo A group showed weaker response decrement”. This is really a very good point, that cannot be easily explained if all the branches (arrows) in the model have the same weight or work as ON/OFF switches. We tried to convey the relative importance of the regulation effect via the thickness of the arrow lines (which we have now clarified in the legend in the revised ms). The main ‘quantitative’ nuances to take into consideration here originate from 2 assumptions of the model (which we have clarified in the revised ms):

      Assumption 1: the inhibitory effect of TAX-6 on the CMK-1 antiadaptation branch and the inhibitory effect of TAX-6 on the CMK-1 pro-adaptation branch are not of the same magnitude (we have further enhanced the line thickness differences in the revised model, top left panel for wild type).

      Assumption 2: the two antagonistic direct effects of CMK-1 on adaptation are not of the same magnitude, most strikingly in the context of CMK-1(gf) mutants.

      In our model, the cyclosporin A treatment alone (bottom left panel) causes a strong boost on the CMK-1 inhibitory branch and a less marked boost on the CMK-1 activator branch (following assumption 1). This causes an imbalance between the two antagonist direct CMK-1-dependent drives, which reduces (but doesn’t fully block) adaptation. Indeed, we don’t observe a total block of adaptation with cyclosporin A in wild type, the effect being significantly milder than the totally nonadapting phenotypes seen, e.g., in TAX-6(gf) mutants. From there, the question is what happen in CMK-1(gf) background that would mask the anti-adaptation effect of Cyclosporin A? Here assumption 2 is relevant, and the CMK-1(gf) pro-adaptation direct branch is always prevalent and imbalances the regulation toward faster adaptation (the role of TAX-6 becoming negligible in the CMK-1(gf) background and ipso facto that of Cyclosporin A).

      Regarding the second point, “the cmk-1(lf) +cyclo A group is labeled as constitutive habituation”. We regret a confusing word choice in the first version of the manuscript; we intended to mean “normal habituation phenotype” but in the joint absence of antagonistic CMK-1 and TAX-6 regulatory signaling (so the regulation is not like in wild-type, but the phenotype ends up like in wild type). We have modified the label to “normal adaptation” and left a note in the legend that an apparently normal adaptation phenotype seems to be the default situation when the two antagonistic regulatory pathways are shut off.

      More discussion of the significance of the sites of cmk-1 and tax-6 function in the neural circuit should take place. Additionally, incorporating the suspected loci of cmk-1 and tax-6 in the neural circuit into the model would be interesting (using proper hypothetical language). For example, as it seems like AFD is not required for the naïve reversal response but just its reduction, cmk-1 activity in AFD might be generating inhibition of the reversal response by AFD. It certainly would be understandable if this isn't workable, given extrasynaptic signaling and other unknowns, but it potentially could also be helpful in generating a working model for these complex interactions. For example, cmk1 induces AIZ inhibition of AVA (AIZ is electrically coupled to AFD), and tax-6 reduces RIM activation of AVA (these neurons are also electrically coupled according to the diagram). RIM is also a neuropeptide-rich neuron, so this could allow it to interact with the cmk-1-related process(es) in AFD. Some discussion of possibilities like this could be informative.  

      Thanks for the comment. These hypothetical inter-cellular communication pathways are indeed nice possibilities. On the other hand, we could envision several additional pathways. While RIM is indeed a neuropeptide-rich neurons, all these neurons actually express neuropeptides. Following this helpful suggestion, we have slightly expanded the discussion of hypothetical cellular pathways that can be modulated downstream of CMK-1 in AFD. We also slightly lengthened the discussion to mention hypothetical post-synaptic target of TAX-6 within interneurons based on the literature.

      Provide an explanation for why some of the experiments in Figure 4 have such a high N, compared to other experiments.  

      The conditions with the highest n correspond to conditions which we have also used as ‘control’ condition for other type of experiments in the lab and as part of side projects, but which could be gathered for the present article. We have been working with cmk-1(lf) and tax-6(gf) mutants for many years… and the robust non-adapting phenotype was a reference point and a quality control when analyzing other nonadapting mutants.

      Because the loss of function and gain of function mutations in cmk-1 have a similar effect, it is likely that this thermosensory plasticity phenotype is sensitive to levels of cmk-1 activity. Therefore, it is not surprising that the cmk-1 promoter failed to rescue very well as these plasmid-driven rescues often result in overexpression. Given this and that the cmk-1p rescue itself was so modest, these rescue experiments are not entirely convincing (and very hard to interpret; for example, is the AFD rescue or the ASER rescue more complete? The ASER one is actually closer to the cmk-1p rescue). Given the sensitivity to cmk-1 activity levels, a degradation strategy would be more likely to deliver clear results (or perhaps even the overactivation approach used for tax-6).  

      Thanks for the comment. We respectfully disagree with this reviewer’s statement “the loss of function and gain of function mutations in cmk-1 have a similar effect”. We suspect a confusion here, because our data clearly show that these two mutant types have an opposite phenotype. That being said, we interpret the weak rescue effect with cmk-1p as a probable result of overexpression or incomplete/imbalanced expression across neurons (as the promoter used might not include all the relevant regulatory regions). We dedicated considerable efforts to establish an endogenous CMK-1::degron knock in, for tissue-specific auxin-induced degradation (AID), but we were unfortunately not able to obtain consistent results. Unfortunately, the only useful data regarding CMK-1 place-of-action are the cell-specific rescue data already included in the report.

      Reviewer #2 (Public review):  

      Summary:  

      The reduction in a response to a specific stimulus after repeated exposures is called habituation. Alterations in habituation to noxious stimuli are associated with chronic pain in humans, however, the underlying molecular mechanisms involved are not clear. This study uses the nematode C. elegans to study genes and mechanisms that underlie habituation to a form of noxious stimuli based on heat, termed thermo-noxious stimuli. The authors previously showed that the Calcium/Calmodulin-dependent protein kinase (CMK-1) regulates thermo-nociceptive habituation in the nematode C. elegans. Although CMK-1 is a kinase with many known substrates, the downstream targets relevant for thermo-nociceptive habituation are not known. In this study, the authors use two different kinase screens to identify phosphorylation targets of CMK-1. One of the targets they identify is Calcineurin (TAX-6). The authors show that CMK-1 phosphorylates a regulatory domain of Calcineurin at a highly conserved site (S443). In a series of elegant experiments, the authors use genetic and pharmacological approaches to increase or decrease CMK-1 and Calcineurin signaling to study their effects on thermo-nociceptive habituation in C. elegans. They also combine these various approaches to study the interactions between these two signaling proteins. The authors use specific promoters to determine in which neurons CMK-1 and Calcineurin function to regulate thermonociceptive habituation. The authors propose a model based on their findings illustrating that CMK-1 and Calcineurin act mostly in different neurons to antagonistically regulate habituation to thermo-nociceptive stimuli in a complex manner.  

      Strengths:  

      (1) Given the conservation of habituation across phylogeny, identifying genes and mechanisms that underlie nociceptive habituation in C. elegans may be relevant for understanding chronic pain in humans.  

      (2) The identification of canonical CaM Kinase phosphorylation motifs in the substrates identified in the CMK-1 substrate screen validates the screen.  

      (3) The use of loss and gain of function approaches to study the effects of CMK-1 and Calcineurin on thermo-nociceptive responses and habituation is elegant.  

      (4) The ability to determine the cellular place of action of CMK-1 and Calcineurin using neuron-specific promoters in the nematode is a clear strength of the genetic model system.  

      Thanks a lot for these positive remarks.

      Weaknesses:  

      (1) The manuscript begins by identifying Calcineurin as a direct substrate of CMK-1 but ends by showing that CMK-1 and Calcineurin mostly act in different neurons to regulate nociceptive habituation which disrupts the logical flow of the manuscript.  

      We understand this point and we have carefully considered and (reconsidered) the way to articulate the report. However, we could not present the story much differently as we would have no justification to investigate the role of TAX-6 and its interaction with CMK-1, if we would not have first identified it as phospho-target in vitro. Carefully considering this point, we found that the abstract of the first manuscript version was probably too cursory and susceptible to trigger wrong expectations among readers. We have thus extensively revised the abstract to clarify this point. Furthermore, we have reinforced this point in the last paragraph of the introduction and in the conclusion paragraph of the Discussion.

      (2) The physiological relevance of CMK-1 phosphorylation of Calcineurin is not clear.

      We do agree and have explicitly mentioned this aspect in the abstract, in the end of the introduction, and in the discussion section.

      (3) It is not clear if Calcineurin is already a known substrate of CaM Kinases in other systems or if this finding is new.  

      We are not aware of any study having shown Calcineurin is a direct target of CaM kinase I. But it was found to be substrate of CaM kinase II as well as of other kinases, as we explicitly presented in the discussion section. We have complemented the text mentioning we are not aware of Calcineurin having so far been reported to be a CaM kinase I substrate.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):  

      (1) The authors might consider reorganizing the results, so that the substrate phosphorylation analysis follows the cmk-1 habituation data, as it may not be clear to the reader why you are looking for substrates downstream of cmk-1 at that point. Or the authors could mention the previous habituation data for cmk-1 at the beginning of the results.  

      Thank you. This is something that we considered while (re-)writing. However, we prefer to keep CMK-1 data side-by-side with TAX-6 data, regarding the result section. Nevertheless, we have modified the last paragraph of intro to better transition and justify the specific interest of searching for CMK-1 targets in the context of the present study.

      (2) Line 209: 'controls' is too strong a word. 'regulates' would be better, and it should be stated that this is for 'spontaneous reversal behavior'.  

      Thank you. This was modified.

      (3) Line 359: we suspect that these reflect functional enrichments.  

      We don’t see what would exactly be wrong with the original sentence. The proposed change (if it is a proposed change) would completely obliterate the intended meaning of our sentence. We rewrote the sentence to be as clear as possible, as follows: ”Even if we cannot rule out an actual inclination of the CaM kinase pathway to regulate these processes, we suspect that these GO term enrichments rather reflect an analytical bias toward abundant proteins.”

      (4) Line 563: In this subsection, it is not made clear when the T0 and T60 heat pulses are given, in relation to the 20s ISI heat pulses given for 60 minutes. Are they the first and last pulse, or given some time before or after this train of heat pulses?  

      Thanks for spotting this poor description, which we have improved in the revised manuscript. The heat pulse recording is given immediately before and immediately after the 60 min of repeated stimulation. After the T0 heat pulse recording there is a period of about 30 s (period of post stimuli recording + transfer from the recording device (INFERNO) to the habituation device (ThermINATOR)).  For the T60 acquisition, there is a lag of about 50 s between the last ‘habituation’ stimuli and the recording stimuli (time needed to move the plate between the habituation device and the recording device + 40 s of baseline reversal recording in the absence of heat stimuli).

      Reviewer #2 (Recommendations for the authors):  

      (1) There appears to be little to no connection between the phosphorylation site discovered in Calcineurin (S443) and the behavioral phenotypes being studied. What is the thermo-nociceptive response if phosphorylation of S443 in Calcineurin is blocked (using a S443A mutation) and/or combined with CMK-1 gain of function?  

      Thanks for the suggestion. The suggested analysis is complicated by several factors. First, the tax-6(lf) is not directly suitable for rescue analysis (until we would have identified a way to restore baseline reversal), so we cannot use a S443A-carrying rescue transgene. Second, the truncated TAX-6(GF) mutant lacks the C-terminal part, including S443, so we cannot introduce a S443A in this context. The left approach would be to modify the endogenous locus. This again is complicated by the fact that S443 exists in two different isoforms (with conserved RxxS motifs in two different alternative exons). It will be very difficult to perform these experiments until we know more about the expression pattern and function of the respective isoforms. This is work in progress, but this analysis will need to await a future publication.

      (2) The authors should state clearly if Calcineurin is a novel substrate of CaM Kinase or if this is already known in the field.  

      We have complemented the text mentioning we are not aware of Calcineurin having so far been reported to be a CaM kinase I substrate.

      (3) The logical flow of the manuscript could be improved given that CMK-1 and Calcineurin appear to act in different cells to regulate nociceptive habituation.  

      As detailed above, we have considered this point carefully and modified the introduction and the abstract. The discussion about the two places of action was also improved.

      (4) More detail about the experimental methods used for the heat-evoked reversals should be included in the Results section.  

      Thanks for the suggestion. We have improved the description in the Method section and expanded the partial description in the result section, so readers could hopefully proceed without needing to go back and forth with the methods.

      (5) Check for typos. For example: line 197 - fix typo "...to a series repeated heat stimulation...".  

      Thank you. We have carefully read the revised manuscript to correct remaining typos.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript described a structure-guided approach to graft important antigenic loops of the neuraminidase to a homotypic but heterologous NA. This approach allows the generation of well-expressed and thermostable recombinant proteins with antigenic epitopes of choice to some extent. The loop-grafted NA was designated hybrid.

      Strengths:

      The hybrid NA appeared to be more structurally stable than the loop-donor protein while acquiring its antigenicity. This approach is of value when developing a subunit NA vaccine which is difficult to express. So that antigenic loops could be potentially grafted to a stable NA scaffold to transfer strain-specific antigenicity.

      Weaknesses:

      However, major revisions to better organize the text, and figure and make clarifications on a number of points, are needed. There are a few cases in which a later figure was described first, data in the figures were not sufficiently described, or where there were mismatched references to figures.

      More importantly, the hybrid proteins did not show any of the advantages over the loop-donor protein in the format of VLP vaccine in mouse studies, so it's not clear why such an approach is needed to begin with if the original protein is doing fine.

      We thank the reviewer for their helpful comments. We have incorporated feedback from the authors to improve the manuscript. Please see our point-by-point response.

      The purpose of loop-grafting between H5N1/2021 (a high-expressor) and the PR8 virus was not to improve the expression of PR8, which is already a good expressing NA. Instead, the loop-grafting and the in vivo experiments were done to show the loop-specific protection following a lethal PR8 virus challenge.

      Reviewer #2 (Public review):

      In their manuscript, Rijal and colleagues describe a 'loop grafting' strategy to enhance expression levels and stability of recombinant neuraminidase. The work is interesting and important, but there are several points that need the author's attention.

      Major points

      (1) The authors overstress the importance of the epitopes covered by the loops they use and play down the importance of antibodies binding to the side, the edges, or the underside of the NA. A number of papers describing those mAbs are also not included.

      We have discussed the distribution of epitopes on NA molecule in the Discussion section "The distribution of epitopes in neuraminidase" (new line number 350). In Supplementary Figures 1 and 2, we have compiled the epitopes reported by polyclonal sera and mAbs via escape virus selection or crystal structural studies. There are 45 residues examples of escape virus selection, and we found that approximately 90% of the epitopes are located within the top loops (Loops 01 and Loops 23, which include the lateral sides and edges of NA). We have also included the epitopes of underside mAbs NDS.1 and NDS.3 in Supplementary Figure 2. Some of the interactions formed by these mAbs are also within the L01 and L23 loops. All relevant references are cited in Supplementary Figures 1 and 2.

      A new figure has been added [Figure 1b (ii)] to illustrate the surface mapping of epitopes on NA.

      (2) The rationale regarding the PR8 hybrid is not well described and should be described better.

      We described the rationale for the PR8 hybrid (new lines 247-250). For clarity, we have added the following sentence within the section "Loop transfer between two distant N1 NAs:...."

      (new lines 255-258):

      "mSN1 showed sufficient cross-reactivity to N1/09 to protect mice against virus challenge. Therefore, we performed loop transfer between mSN1 and PR8N1, which differ by 18 residues within the L01 and L23 loops and show no or minimal cross-reactivity, to assess the loop-specific protection."

      (3) Figure 3B and 6C: This should be given as numbers (quantified), not as '+'.

      We have included the numerical data in Supplementary Figure 6. The data is presented in semi-quantitative manner for simplification. To improve clarity, we have now added the following sentence to the Figure 3c legend: "Refer to Supplementary Figure 6 for binding titration data".

      (4) Figure 5A and 7A: Negative controls are missing.

      A pool of Empty VLP sera was included as a negative control, showing no inhibition at 1:40 dilution. In the figure legends, we have stated "Pooled sera to unconjugated mi3 VLP was negative control and showed no inhibition at 1:40 dilution (not included in the graphs)"

      (5) The authors claim that they generate stable tetramers. Judging from SDS-PAGE provided in Supplementary Figure 3B (BS3-crosslinked), many different species are present including monomers, dimers, tetramers, and degradation products of tetramers. In line 7 for example there are at least 5 bands.

      Tetrameric conformation of soluble proteins is evidenced by the size-exclusion chromatographs shown in Figures 3a and 6b. The BS3 crosslinked SDS-PAGE are only suggestive data, indicating that the protein is a tetramer if a band appears at ~250 kDa. However, depending on the reaction conditions, lower molecular weight bands may also be observed if crosslinking is incomplete.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Specific comments:

      - Description of Figure 2 on page 3 should go before Figure 3 lines 87-105 or swap the order of the two figures.

      We have moved lines 91-96, which refer to Figure 3, to appear after Figure 2.

      - Figure 3a, an EC50 should be calculated for both NA activity assay.

      Figure 3a has been updated to include the EC50 and AUC (Area under curve) values for both NA activity assays. The same update has also been made for Figure 6b.

      - Line 150, I'm not sure it's appropriate to cite a manuscript that was in preparation but not published. I'm referring to the two mAbs AG7C and AF9C that were claimed to bind to the L01 and L23 loops but not.

      We have changed the "manuscript in preparation" to "personal communication with Dr. Yan Wu, Capital Medical University".

      - The description in Figure 4a is lacking.

      We have added a detailed description for Figure 4a.

      - Figure 4c, sufficient description is needed. For example, the cavity should be outlined and annotated, what is the role of Val149? Why the first monomer is assigned a number of II and the second monomer with a number of I.

      We have added a detailed description for Figure 4c and amended the figure as per the reviewer’s suggestions.

      - Figure 5a, in addition to ELLA data to mSN1 and N1/09, ELLA data to N1/19 should also be measured and shown. Figure S7, please show IC50 instead of curves for better comparison.

      We included IC50 for mSN1 and N1/09 as we intended to associate the loops with protection.  Graphs for N1/19 have not been reported, but the IC50 titres from pooled sera are shown in Supplementary Figure 7 as a representation. Due to the limited sera sample sourced from tail vein bleed, these assays were performed using pooled sera, which represent the total response (established in numbers of experiments).

      - Line 234-238, the author made a statement about the data shown in Figure 7b "These results mirrored several studies in the literature which showed that immunization with the 2009 N1 could provide at least partial protection in mice and ferrets to the avian H5N1 challenge". The data did not reflect that. In Figure 5b, mSN1 protects as well as other proteins. In fact, there was no advantage of N109 and N109 hybrid over mSN1 in protection against the homologous H1N109. Although higher levels of NAI antibodies were induced with the homologous protein in Figure 5a. The protection could be contributed by non-NAI antibodies, so the authors should measure binding antibodies. The author may increase the challenge dose from 200 LD50 to 1000 LD50 to see a difference due to the strong immunogenicity of the nanoparticles vaccine plus addavax. Otherwise, it looks like loop grafting is not necessary as heterologous NA could broadly protect.

      We agree that msN1, despite its low NAI titres, was equally protective as homologous NA or its hybrid NA against H1N1/09 virus challenge at 200 LD50. There may be additional protective components, including non-NAI antibodies in homologous groups that may have contributed to the protection.

      We assessed sera binding to H1N1/2009 and found that the binding antibody levels were also lower in the msN1 group. The corresponding graph has now been added in Figure S7d. It was difficult to determine the NAI titre required to confer protection in this experiment. For this reason, we later chose PR8 as the challenge virus to demonstrate loop-specific protection.

      We are uncertain whether a 1000 LD50 challenge would have helped establish a correlation between protection and NAI IC50 titres, as the dose used is already lethal for DBA/2 mice.

      - Why would the authors separate work with N1/09 and N1/19 from PR8 N1? To this reviewer's understanding, they are all the same strategies with increasing numbers of dissimilar residues from N1/09 (12) to N1/19 (16) and to PR8 (18). They are all characterized by the same approaches in vitro and in vivo.

      We had two different goals for making hybrids with N1/09 and PR8 N1, therefore, we have presented these results separately.

      (1) For N1/09 and N1/19, we showed that loop-grafting improved protein yield and stability. Additionally, we showed that the N1/09 hybrid can be as protective as the homologous protein.

      (2) PR8 N1 is a high-yielding protein, so loop grafting did not significantly increase its yield. However, the PR8 virus challenge confirmed loop-specific protection.

      - For in vivo study testing the PR8 construct, although PR8 and PR8 hybrid protect better than the heterologous mSN1, the hybrid again did not show any advantages over the PR8 original proteins.

      That's correct - the PR8 hybrid was not advantageous over the original PR8 protein. However, the purpose of this experiment was to demonstrate loop specific protection. The PR8 hybrid (PR8 loops - mS scaffold) protected 6/6 mice, whereas mS hybrid (mS loops - PR8 scaffold) provided no protection.

      - Line 243-249, lack of reference to figures.

      References to Supplementary Figure 7b,c and Figure 2 has been added.

      - What was the reason that the challenge was one by 200 LD50 for 2009 H1N1 and 1000 LD50 for PR8.

      Viruses were titrated in the BALB/c strain for PR8 virus and the DBA/2 strain for X-179A (H1N1/2009) virus. These doses were selected based on their lethality and the time required to reach the endpoint (~20% weight loss) post-infection, which is 5-6 days. Most studies in the literature have used 10 LD50 or higher; thus the virus doses we used are relatively high.

      - Line 268, there is no Figure 5C.

      This was a mistake and has been corrected to Figure 6c.

      - Line 275 what are the readers supposed to see in supplementary Figure 5a? There is not enough description for the referred figures.

      A sentence has been added to Fig S5a description, to make a point about recognition of the NA scaffold by mAb CD6. "Binding by mAb CD6 is predominantly scaffold dependent and occurs across two protomers"

      - The discussion is very long and some of it is not relevant to the study. For example, the role of the tetramerization domain and the basis for structurally stable tetramer formation, were not the focuses of this study.

      We felt it was important to discuss the tetramerisation domain and the basis for stable tetramer formation. A previous study by Ellis et al.  used the VASP tetramerisation domain and introduced multiple NA interface mutations to achieve a more stable closed conformation. In contrast, NA proteins used in our study required the tetrabrachion tetramerisation domain to form a properly assembled tetramer.

      In lines 382-383, there is one unfinished sentence.

      This is corrected.

      The definition of the loops is also confusing. Line 381, the author stated that in the N1/19 hybrid design, residue N200S, could have been considered as part of the loop B2L23, and was it not?

      The designation of loop ends should not be rigid but rather based on multiple factors such as, their proximity to antigenic epitopes, charge, and hydrophobicity. This is discussed in the " Definition of loops" section.

      - Figure 1a and Figure S2, please provide sufficient descriptions, what do the blocks in different colors mean?

      We have updated the Figure 1a legend to indicate the colours.

      The descriptions for Figures S1 and S2 have also been revised for clarity.

      Reviewer #2 (Recommendations for the authors):

      Minor points

      (1) Line 37: Should be 'Influenza virus neuraminidase'.

      This is corrected.

      (2) Line 65: https://pubmed.ncbi.nlm.nih.gov/35446141/, https://pubmed.ncbi.nlm.nih.gov/33568453/ and https://pubmed.ncbi.nlm.nih.gov/28827718/ indicate that protective mAbs bind all over the NA head domain.

      We have discussed the epitopes on the NA head in detail in the section "The distribution of epitopes on Neuraminidase". In Supplementary Figures 1 and 2, we compiled several studies, including those on polyclonal sera and mAbs epitopes, emphasizing that loops 01 and 23 are the predominant antibody targets (~90%). Some antibodies also bind to the underside of NA. We have discussed and referenced these studies accordingly.

      A new figure has been added [Figure 1b (ii)] to illustrate the surface mapping of epitopes on NA.

      The first reference has been included in both our discussion and Supplementary figure 1.

      The NA epitopes discussed in the second reference have also been incorporated into our discussion and Supplementary figures 1 and 2. Note that, the E258K mutation generated on the NA underside was not relevant to mAbs and was generated randomly by passaging of H3N2 A/New York/PV190/2017 virus. 

      The third reference pertains to murine mAbs against influenza B virus NA.

      (3) Lines 71, 72, and throughout: 'et al.' should be in italics.

      All "et al." have been italicised.

      (4) Many abbreviations are not defined including CHO, SDS-PAGE, MUNANA, mi3, HEPES, BSA, TPCK, MWCO, HRP, PBS, TMB, TCID50, LD50, MES, PEG, PGA, MME, PGA-LM.

      The text has been amended to define these abbreviations.

      (5) Line 209: Shouldn't this be ID50 instead of IC50? Also, it is not defined.

      IC50 has been defined.

      (6) Line 210, line 346, line 581-582: No need to capitalize letters at the beginning of words mid-sentence.

      This is amended.

      (7) Line 227: Is 2009 H1N1 NA meant?

      This has been changed to "H1N1/2009 neuraminidase"

      (8) Line 310: Is this really quantitatively true? (see major comment 1).

      Based on the compilation of epitopes from published NA mAbs and polyclonal sera (via escape mutagenesis and NA-Fabs crystal structures), it is accurate to state that the protective epitopes are primarily located within loops 01 and 23.

      Please also refer to our response to minor point 2. 

      (9) Line 352 and throughout the manuscript: 'in vitro' should be in italics.

      This is amended.

      (10) Line 355: https://pubmed.ncbi.nlm.nih.gov/35446141/https://pubmed.ncbi.nlm.nih.gov/33568453/ and https://pubmed.ncbi.nlm.nih.gov/28827718/ should be included here.

      Studies reporting epitopes on Influenza A neuraminidase have been compiled in Supplementary Figures 1 and 2 and cited appropriately.

      (11) Line 365: https://pubmed.ncbi.nlm.nih.gov/35446141/ and https://pubmed.ncbi.nlm.nih.gov/33568453/ also describe epitopes on the underside of the NA.

      Please refer to the above response to point 10.

      (12) Line 365: Reference https://pubmed.ncbi.nlm.nih.gov/37506693/ is missing here.

      The reference has been added.

      (13) Line 369-371: Is it really a minority?

      In terms of the protective response, the majority of the antibody response is directed towards loops 01 and 23, which form the top antigenic surface. The term 'lateral' is used in some literature to describe NA mAb epitopes; loops 01 and 23 also encompass the lateral regions.

      To clarify this, we have added the following sentence to the Discussion section - "The distribution of epitopes on neuraminidase"

      "It is important to note that loops 01 and 23 include a portion of epitopes that have been described in the literature as side, lateral, or underside (see mAbs NDS.1, NDS.3, and CD6 in Supplementary Fig. 2)"

      Additionally in our studies in mice, we showed that protection is mediated by antibodies targeting the loops (Figure 7). We are uncertain about the binding response to the NA underside, but the NA inhibiting and protective response to the underside appears to be minimal.

      Furthermore Lederhof et al. showed that among the 'underside' mAbs, NDS.1 protected mice against virus challenge, whereas NDS.3 did not. In our analysis (Supplementary Figure 2), NDS.1 makes eight-residue contacts with B4L01 and B5L01, whereas NDS.3 make five-residue contacts with B3L01 and B4L01.

      (14) Line 530: The A in ELLA already stands for assay.

      This is corrected.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      This manuscript by Kremer et al. characterizes the tissue-specific responses to changes in TFAM levels and mtDNA copy number in prematurely aging mice (polg mutator model). The authors find that overexpression of TFAM can have beneficial or detrimental effects depending on the tissue type. For instance, increased TFAM levels increase mtDNA copy number in the spleen and improve spleen homeostasis but do not elevate mtDNA copy number in the liver and impair mtDNA expression.

      Similarly, the consequences of reduced TFAM expression are tissue-specific. Reduced TFAM levels improve brown adipocyte tissue function while other tissues are unaffected. The authors conclude that these tissue-specific responses to altered TFAM levels demonstrate that there are tissue-specific endogenous compensatory mechanisms in response to the continuous mutagenesis produced in the prematurely aging mice model, including upregulation of TFAM expression, elevated mtDNA copy number, and altered mtDNA gene expression. Thus, the impact of genetically manipulating global TFAM expression is limited and there must be other determinants of mtDNA copy number under pathological conditions beyond TFAM. 

      Strengths: 

      Overall, this is an interesting study. It does a good job of demonstrating that given the multi-functional role of TFAM, the outcome of manipulating its activity is complex. 

      Weaknesses: 

      No major weaknesses were noted. We have minor suggestions for improving the clarity of the manuscript that are detailed in the "recommendations for the authors" section. 

      We thank the reviewer for the suggestions and addressed them as described in the "recommendations for the authors" section.

      Reviewer #2 (Public review): 

      Summary: 

      This study by Kremer et al. investigates the impact of modulation of expression of TFAM, a key protein involved in mitochondrial DNA (mtDNA) packaging and expression, in mtDNA mutator mice, which carry random mtDNA mutations. While previous research suggested that increasing TFAM could counteract the pathological effects of mtDNA mutations, this study reveals that the effects of TFAM modulation are tissue-specific. These findings highlight the complexity of mtDNA copy number regulation and gene expression, emphasizing that TFAM alone is not the sole determinant of mtDNA levels in contexts where oxidative phosphorylation is impaired. Other factors likely play a significant role, underscoring the need for nuanced approaches when targeting TFAM for therapeutic interventions. 

      Strengths: 

      The data presented in the manuscript is of high quality and supports major conclusions. 

      Weaknesses: 

      The statistical methods used are not clearly described, and some marked nonsignificant results appear visually significant, which raises concerns about data analysis. 

      Data presentation requires improvement. 

      We thank the reviewer for the comments. We updated the text in the Materials and Methods section to state the statistical methods and improved the figures as described in detail in the "recommendations for the authors" section.

      Recommendations for the authors:

      (1) Please include testis data in Figure 2 given previous work by authors showing that elevated mtDNA copy number can improve testis function. It would be interesting to compare the changes in mtDNA copy number in testis to these other tissues.

      We measured mtDNA copy number in testis using the CytB probe and added it as Supplementary figure 2 A.

      (2) The clarity of Table 1 could be improved. It is difficult to know whether the changes in the TFAM to mtDNA ratio are driven by changes in TFAM levels or mtDNA copy number. A suggestion is to include the TFAM and mtDNA values in parenthesis next to each listed ratio.

      We updated Table 1 and included the values of the normalized TFAM and mtDNA levels in parentheses.

      (3) The authors should consider showing TFAM western blot data in Figure 1.

      We thank the reviewer for the suggestion but would like to keep the TFAM western blot data with the other western blot data for the respective tissue.

      (4) The graphs for qPCR data (e.g. Figure 2) show mRNA or mtDNA levels relative to the control, which is always set to 1. Why, then, does the control group display error bars?

      For the normalization of the data to the WT group, we first calculate the average of the values from all the samples of the WT group. We then divide all values from the samples of all groups, including the WT group, by that average value. By doing so, we set the average value of the WT group to 1 and express all values from all samples of all groups, including the WT group, relative to this average value. Differences between the samples of the WT group are hence retained and allow for error calculations and the display of error bars.  

      (5) Page 3 second sentence to the last: overexpression of TFAM leads to...? Did the author mean mtDNA?

      We updated the text to “Heterozygous knockout of Tfam in wild-type mice results in ~50% decrease of mtDNA levels, whereas moderate overexpression of Tfam leads to ~50% increase in mtDNA levels25,26”

      (6) The sentence "In summary, mtDNA copy number regulation is more complex than previously assumed and the TFAM-to-mtDNA ratio seems to be finely tuned in a tissue-specific manner" - not clear who assumed (references?) and based on what data, please rephrase.

      We updated the text and it now reads “In summary, mtDNA copy number regulation is more complex than suggested by previous studies23–27 and the TFAM-to-mtDNA ratio seems to be finely tuned in a tissue-specific manner.”

      (7) The significant increase in complex II activity under TFAM overexpression (Figure 3) warrants additional discussion.

      We updated the Results section and it now reads “We detected increased levels of the complex II subunit Succinate Dehydrogenase Complex Iron Sulfur Subunit B (SDHB). Complex II is exclusively nuclear encoded and a compensatory increase upon impaired mitochondrial gene expresson has been observed before32.

      We proceeded to measure the enzyme activities of individual OXPHOS complexes in liver mitochondria (Fig. 3C). The complex I and complex IV activities were reduced to about 50% in Polg-/mut; Tfam+/+ mice in comparison with wild-type mice (Fig. 3C). However, we did not see any further alteration of the reduced enzyme activities induced by TFAM overexpression or reduced TFAM expression (Fig. 3C). Interestingly, we detected a significant increase in complex II and complex II + complex III activity upon TFAM overexpression, which can partially be explained by the increased complex II protein levels we oberseved in Polg-/mut; Tfam+/OE mice (Fig. 3, B and C).”

      (8) The statistical methods used should be explicitly stated. Some results marked as non-significant appear visually significant, for example, mt-Cytb in Figure 2C, Supplementary Figure 2B).

      We updated the text in the Materials and Methods section to state the statistical methods and it now reads “Statistical analysis and generation of graphs were performed with GraphPad Prism v9 software except for quantitative mass spectrometry data which was analyzed and plotted using R as described above. Statistical comparisons were performed using one-way analysis of variance (ANOVA), and post hoc analysis was conducted with Dunnett’s multiple comparisons test. Values of P < 0.05 were considered statistically significant.”

      Minor points: 

      (1) Replace numerical indications of significance with asterisks for consistency.

      We replaced all numerical indications of significance with asterisks.

      (2) Abbreviations SKM and BAT are not defined.

      We removed the mentioning of SKM (skeletal muscle) as the data from this tissue was not included. The Introduction reads “In contrast, in brown adipose tissue (BAT), a decrease in TFAM levels normalized Uncoupling protein 1 (Ucp1) expression.”

      (3) Use uniform scales across bar graphs in Figure 2 to improve clarity.

      We updated Figure 2 to have uniform scales.

      (4) Remove or increase the transparency of data points in Figure 1A to make group averages more discernible.

      We removed the data points in Figure 1A.

      (5) Add a Y-axis title to Figure 1C.

      We added the Y-axis title “Heart / body weight” to Figure 1C.

      (6) Size of the font used in some figures (4?) is not appropriate.

      We increased the font size for the figures.

      (7) All figure legend titles need work. Insert "expression" after TFAM in the Figure 2 title, Change the title to "Modulation of TFAM expression..." in Figure 4. 

      The figure legends now read as follows:

      “Figure 2: Modulation of TFAM expression affects mtDNA copy number in a tissue-specific manner.”

      “Figure 4: Alteration of TFAM expression does not affect the heart phenotype of mtDNA mutator mice.”

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this paper Kawasaki et al describe a regulatory role for the PIWI/piRNA pathway in rRNA regulation in Zebrafish. This regulatory role was uncovered through a screen for gonadogenesis defective mutants, which identified a mutation in the meioc gene, a coiled-coil germ granule protein. Loss of this gene leads to redistribution of Piwil1 from germ granules to the nucleolus, resulting in silencing of rRNA transcription.

      Strengths:

      Most of the experimental data provided in this paper is compelling. It is clear that in the absence of meioc, PiwiL1 translocates in to the nucleolus and results in down regulation of rRNA transcription. the genetic compensation of meioc mutant phenotypes (both organismal and molecular) through reduction in PiwiL1 levels are evidence for a direct role for PiwiL1 in mediating the phenotypes of meioc mutant.

      Weaknesses:

      Questions remain on the mechanistic details by which PiwiL1 mediated rRNA down regulation, and whether this is a function of Piwi in an unperturbed/wildtype setting. There is certainly some evidence provided in support of the natural function for piwi in regulating rRNA transcription (figure 5A+5B). However, the de-enrichment of H3K9me3 in the heterozygous (Figure 6F) is very modest and in my opinion not convincingly different relative to the control provided. It is certainly possible that PiwiL1 is regulating levels through cleavage of nascent transcripts. Another aspect I found confounding here is the reduction in rRNA small RNAs in the meioc mutant; I would have assumed that the interaction of PiwiL1 with the rRNA is mediated through small RNAs but the reduction in numbers do not support this model. But perhaps it is simply a redistribution of small RNAs that is occurring. Finally, the ability to reduce PiwiL1 in the nucleolus through polI inhibition with actD and BMH-21 is surprising. What drives the accumulation of PiwiL1 in the nucleolus then if in the meioc mutant there is less transcription anyway?

      Despite the weaknesses outlined, overall I find this paper to be solid and valuable, providing evidence for a consistent link between PIWI systems and ribosomal biogenesis. Their results are likely to be of interest to people in the community, and provide tools for further elucidating the reasons for this link.

      The amount of cytoplasmic rRNA in piwi+/- was increased by 26% on average (figure 5A+5B), the amount of ChiP-qPCR of H3K9 was decreased by about 26% (Figure 6F), and ChiP-qPCR of Piwil1 was decreased by 35% (Figure 6G), so we don't think there is a big discrepancy. On the other hand, the amount of ChiP-qPCR of H3K9 in meioc<sup>mo/mo</sup> was increased by about 130% (Figure 6F), while ChiP-qPCR of Piwil1 was increased by 50%, so there may be a mechanism for H3K9 regulation of Meioc that is not mediated by Piwil1. As for what drives the accumulation of Piwil1 in the nucleolus, although we have found that Piwil1 has affinity for rRNA (Fig. 6A), we do not know what recruits it. Significant increases in the 18-35nt small RNA of 18S, 28S rRNAs and R2 were not detected in meioc<sup>mo/mo</sup> testes enriched for 1-8 cell spermatogonia, compared with meioc<sup>+/mo</sup> testes. The nucleolar localization of Piwil1 has revealed in this study, which will be a new topic for future research.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors report that Meioc is required to upregulate rRNA transcription and promote differentiation of spermatogonial stem cells in zebrafish. The authors show that upregulated protein synthesis is required to support spermatogonial stem cells' differentiation into multi-celled cysts of spermatogonia. Coiled coil protein Meioc is required for this upregulated protein synthesis and for increasing rRNA transcription, such that the Meioc knockout accumulates 1-2 cell spermatogonia and fails to produce cysts with more than 8 spermatogonia. The Meioc knockout exhibits continued transcriptional repression of rDNA. Meioc interacts with and sequesters Piwil1 to the cytoplasm. Loss of Meioc increases Piwil1 localization to the nucleolus, where Piwil1 interacts with transcriptional silencers that repress rRNA transcription.

      Strengths:

      This is a fundamental study that expands our understanding of how ribosome biogenesis contributes to differentiation and demonstrates that zebrafish Meioc plays a role in this process during spermatogenesis. This work also expands our evolutionary understanding of Meioc and Ythdc2's molecular roles in germline differentiation. In mouse, the Meioc knockout phenocopies the Ythdc2 knockout, and studies thus far have indicated that Meioc and Ythdc2 act together to regulate germline differentiation. Here, in zebrafish, Meioc has acquired a Ythdc2-independent function. This study also identifies a new role for Piwil1 in directing transcriptional silencing of rDNA.

      Weaknesses:

      There are limited details on the stem cell-enriched hyperplastic testes used as a tool for mass spec experiments, and additional information is needed to fully evaluate the mass spec results. What mutation do these testes carry? Does this protein interact with Meioc in the wildtype testes? How could this mutation affect the results from the Meioc immunoprecipitation?

      Stem cell-enriched hyperplastic testes came from wild-type adult sox17::GFP transgenic zebrafish. Sperm were found in these hyperplastic testes, and when stem cells were transplanted, they self-renewed and differentiated into sperm. It is not known if the hyperplasias develop due to a genetic variant in the line. We added the following comment in L201-204.

      “The SSC-enriched hyperplastic testes, which are occasionally found in adult wildtype zebrafish, contain cells at all stages of spermatogenesis. Hyperplasia-derived SSCs self-renewed and differentiated in transplants of aggregates mixed with normal testicular cells.”

      Reviewer #3 (Public review):

      Summary:

      The paper describes the molecular pathway to regulate germ cell differentiation in zebrafish through ribosomal RNA biogenesis. Meioc sequesters Piwil1, a Piwi homolog, which suppresses the transcription of the 45S pre-rDNA by the formation of heterochromatin, to the perinuclear bodies. The key results are solid and useful to researchers in the field of germ cell/meiosis as well as RNA biosynthesis and chromatin.

      Strengths:

      The authors nicely provided the molecular evidence on the antagonism of Meioc to Piwil1 in the rRNA synthesis, which supported by the genetic evidence that the inability of the meioc mutant to enter meiosis is suppressed by the piwil1 heterozygosity.

      Weaknesses:

      (1) Although the paper provides very convincing evidence for the authors' claim, the scientific contents are poorly written and incorrectly described. As a result, it is hard to read the text. Checking by scientific experts would be highly recommended. For example, on line 38, "the global translation activity is generally [inhibited]", is incorrect and, rather, a sentence like "the activity is lowered relative to other cells" is more appropriate here. See minor points for more examples.

      Thank you for pointing that out. I corrected the parts pointed out.

      (2) In some figures, it is hard for readers outside of zebrafish meiosis to evaluate the results without more explanation and drawing.

      We refined Figure 1A and added explanation about SSC, sox17::egfp positive cells, and the SSC-enriched hyperplastic testis in L155-158.

      (3) Figure 1E, F, cycloheximide experiments: Please mention the toxicity of the concentration of the drug in cell proliferation and viability.

      When testicular tissue culture was performed at 0.1, 1, 10, 100, 250, and 500mM, abnormal strong OP-puro signals including nuclei were found in cells at 10mM or more. We added the results in the Supplemental Figure S2G. In addition, at 1mM, growth was perturbed in fast-growing 32≤-cell cysts of spermatogonia, but not in 1-4-cell spermatogonia, as described in L127-130.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I don't have any recommendations for improvement. While I have outlined some of the weaknesses of the paper above. I don't see addressing these questions as pertinent for publication of this paper.

      Reviewer #2 (Recommendations for the authors):

      (1) The manuscript uses the terms 1-2 cell spermatogonia, GSC, and SSC throughout the figures and text. For example, 1-2 cell spermatogonia is used in Figure 1C, GSC is used in Figure 1F, and SSC is used in Figure 1 legend. The use of all three terms without definitions as to how they each relate with one another is confusing, particularly to those outside the zebrafish spermatogenesis field. It would be best to only use one term if the three terms are used interchangeably or to define each term if they represent different populations.

      GSC is a writing mistake. In this study, sox17-positive cells, which have been confirmed to self-renew and differentiate (Kawasaki et al., 2016), are considered SSCs. On the other hand, a comparison of meioc and ythdc2 mutants revealed differences in the composition of each cyst, so we describe the number of cysts confirmed. We added new data that 1-2 cell spermatogonia are sox17-positive in Supplemental Figure S3 (L157-158).

      (2) Figure 1B: What does the "SC" label represent in these figure panels?

      We added the explanation in the Figure legend.

      (3) Fig 7B and S7B show incongruent results, and the text implies that Fig S7B data better reflects in vivo biology. It is not clear how the authors interpret the different results between 7B and S7B.

      Thank you for pointing that out. Fig 7A and 7B were obtained by isolating sox17-positive cells. Because it was difficult to detect nucleoli in the isolated cells, probably due to the isolation procedure, we added S7B, which was analyzed in sectioned tissues. As this reviewer pointed out, S7B reflects the in vivo state better, so we changed S7B to 7B and 7B to S7B.

      Reviewer #3 (Recommendations for the authors):

      Minor points:

      (1) For general readers, it is nice to add a scheme of zebrafish spermatogenesis (lines 77-78) together with Figure 1A.

      As mentioned above, we refined Figure 1A.

      (2) Line 28, silence: the word "silence" is too strong here since rDNA is transcribed in some levels to ensure the cell survival.

      Thank you for your comment. We changed "silence" to "maintain low levels."

      (3) Line 60, YTDHC2: Please explain more about what protein YTDHC2 is.

      We added a description of Ythdc2 in the introduction.

      (4) Line 69, Piwil1: Please explain more about what protein Piwil1 is.

      We added a description of Piwil1 in the introduction.

      (5) Figure 1B, sperm: Please show clearly which sperms are in this figure using arrows etc.

      We represented sperm using arrowheads in Fig 1B.

      (6) Figure 1C, SC: Please show what SC is in the legend.

      We added the explanation in the Figure legend.

      (7) Line 83, meiotic makers: should be "meiotic prophase I makers".

      Thank you for pointing out the inaccurate expression description. We revised it.

      (8) Line 84, phosphor-histone H3: Should be "histone H3 phospho-S10 "

      We revised it.

      (9) Figure S1A, PH3: Please add PH3 is "histone H3 phospho-S10 ".

      We revised it.

      (10) Figure S1A, moto+/-: this heterozygous mutant showed an increased apoptosis. If so, please mention this in the text. If not, please remove the data.

      Thank you for pointing that out. The heterozygous mutant did not increase apoptosis, so we removed the data.

      (11) Line 88, no females developed: This means all males in the mutant. If so, what Figure S1B shows? These cells are spermatocytes? No "oocytes" developed is correct here?

      All meioc<sup>mo/mo</sup> zebrafish were males, and the meioc<sup>mo/mo</sup> cells in Fig. S1B are spermatogonia. No spermatocytes or oocytes were observed. To show this, we added "no oocytes" in L90.

      (12) Line 89, initial stages: What do the initial stages mean here? Please explain.

      The “initial stages” was changed to the pachytene stage.

      (13) Figure S1C: mouse Meioc rectangle lacks a right portion of it. Please explain two mutations encode a truncated protein in the main text.

      I apologize. It seems that the portion was missing during the preparation of the manuscript. We corrected it. In addition, we added a description of the protein truncation in L100-101.

      (14) Line 99: What "GRCz11" is.

      GRCz11 refers to the version of the zebrafish reference genome assembly. We added this.

      (15) Figure S2A: Dotted lines are cysts. If so, please mention it in the legend.

      We corrected the figure legend.

      (16) Figure S2B and C:, B1-4, C1-7: Rather use spermatogonia etc as a caption here.

      We corrected the figure and figure legend.

      (17) Line 113, hereafter, wildtype: Should be "wild type" or "wild-type".

      We corrected them.

      (18) Figure 1C: Please indicate what dotted lines mean here.

      We added “Dotted lines; 1-2 cell spermatogonia.”

      (19) Line 113, de novo: Please italicize it.

      We corrected it.

      (20) Line 113-116: Figure 1D shows two populations in the protein synthesis (low and high) in the 1-2-cell stage. Please mention this in the text.

      We added mention of two population.

      (21) Line 121, in vitro: Please italicize it.

      We corrected it.

      (22) Line 138-139, Figure 2A: Please indicate two populations in the rRNA concentrations (low and high) in the 1-2-cell stage. How much % of each cell is?

      We added mention of two population and % of each cell.

      (23) Figure 2B, cytes: Please explain the rRNA expression in spermatocytes (cytes) in the text.

      The decrease in rRNA signal intensity in spermatocytes was added.

      (24) Figure 2A, lines 147, low signals: Figure 2A did not show big differences between wild type and the mutant. What did the authors mean here? Lower levels of rRNAs in the mutant than in wild type. If so, please write the text in that way.

      We think that it is important to note that we were unable to find cells with upregulated rRNA signals, and therefore changed to “could not find cells with high signals of rRNAs and Rpl15 in meioc<sup>mo/mo</sup> spermatogonia”.

      (25) Figure 2E: Please add a schematic figure of a copy of rDNA locus such as Fig. S3A right.

      We added a schema of rDNA locus and primer sites such as Figure S3A right (now Figure 2F) in Figure 2E.

      (26) Figure S3A: This Figure should be in the main Figure. The quantification of Northern blots should be shown as a graph with statistical analysis.

      We added the quantification and transfer to the main Figure (Figure 2F).

      (27) Figure 4A: Please show single-color images (red or green) with merged ones.

      We added single-color images in the Figure 4A.

      (28) Line 198, Piwil1: Please explain what Piwil1 is briefly.

      We are sorry, but we could not quite understand the meaning of this comment. To show that Piwil1 is located in the nucleolus, we indicated it as (Figure 4A, arrowhead) in L209.

      (29) Line 198, Ddx4-positive: What is "Ddx4-positive"? Explain it for readers.

      Ddx4 is a marker for germinal granules, and the description was changed to reflect this.

      (30) Line 209, Fig. S4D-G: Please mention the method of the detection of piRNA briefly.

      We have described that we have sequenced small RNAs of 18-35 nt. Accordingly, we changed the term piRNA to small RNA.

      (31) Line 217: Please mention piwil1 homozygous mutant are inviable.

      We added that piwil1-/- are viable in L231.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Recommendations for the authors):

      (1) Storyline and Narrative Flow:

      Consider revising the manuscript to create a more coherent and consistent narrative. Clarify how each section of the study-particularly the transition from multi-omics data integration to single-cell RNA-seq validation-contributes to the overall research question. This will help readers better understand the logical flow of the study.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      We have modified some text, including the connections between different sections in the results part and the objectives and roles of various analyses in each section, thus enhancing the coherence between the contexts and clarifying the objectives and functions of each analysis, We believe this will help readers better understand the main content of the entire text.

      (2) Immune Cell Activity Analysis:

      Reevaluate the methods used to assess immune cell activities within the context of the tumor microenvironment. Consider providing additional justification for the relevance of using the cancer cell model for this analysis. If necessary, explore alternative methods or models that might offer more meaningful insights into immune-tumor interactions.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      Using RNA-Bulk data, we evaluated the tumor immune microenvironment through various methods to assess immune infiltration levels and responses to immunotherapy. We found that the results were largely consistent with those presented in the manuscript, providing strong support for our viewpoints. We also acknowledge the limitations of findings from bioinformatics analysis. In our upcoming research, we plan to develop organoid models with gene expression patterns of both CS1 and CS2 subtypes, using these models as a foundation for studying the tumor immune microenvironment.

      (3) Single-Cell RNA-Seq Validation:

      Expand the validation of your findings using single-cell RNA-seq data. This could include more in-depth analyses that explore the heterogeneity within the subtypes and confirm the robustness of your classification method at the single-cell level. This would strengthen the support for your claims about the relevance of the identified subtypes.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      In this manuscript, we employed the NTP algorithm to classify malignant cells identified by the CopyKAT algorithm using characteristic genes of CS1 and CS2 subtypes. This approach is similar to previous method that analyzed patients in the ICGC cohort with the same subtype genes. We consider this classification method valid.

      After classifying the malignant cells, we performed metabolic and cell communication analyses on the CS1 and CS2 subtype cells, revealing significant differences in biological pathways enriched by differential genes, metabolic levels, and cell signaling patterns. These differences align with variations observed in prior classifications and analyses based on RNA-Bulk data.

      We also acknowledge that validating the classification method solely with the single-cell dataset from this study is insufficient. We analyzed GSE202642 using the same processes and methods as GSE229772, finding that the results were generally consistent, indicating that our classification method exhibits a degree of robustness at the single-cell level.

      (4) Methodological Justification:

      Provide a more detailed rationale for the selection of machine learning algorithms and integration strategies used in the study. Explain why the chosen methods are particularly well-suited for this research, and discuss any potential limitations they might have.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      We have updated the methodology section to enhance readers' understanding of the fundamental principles involved. This analysis has two key features: first, it combines 10 machine learning algorithms to generate 101 models and ultimately selects the prognostic prediction model with the highest C-index from these 101 algorithms; second, it utilizes the LOOCV method to analyze the training and validation sets. Compared to the conventional method of randomly dividing the training and validation sets by a fixed ratio, this approach significantly minimizes the bias and randomness introduced by the splitting process. Therefore, we believe this analysis can leverage the characteristic genes of the CS1 and CS2 subtypes, combined with existing clinical data from public databases, to yield results that are more accurate and reliable than the commonly used prognostic models in previous literature, such as COX regression and Lasso regression, as well as other individual algorithms. While this analysis presents advantages over some previous modeling methods, it is essential to recognize that it remains based on analyses conducted using public databases, which may obscure certain factors that might be clinically relevant to patient prognosis due to the mathematical logic of the algorithms.

      (5) Figures and Visualizations:

      Improve the clarity of your figures by addressing the following:

      a) Figure 3A: Cluster the pathways to make the comparisons clearer and more meaningful.

      b) Figure 4A: Clearly explain the significance of the blue bar.

      c) Figure 4B: Ensure this figure is discussed in the main text to justify its inclusion.

      d) Figure 7C: Enhance the figure legend to provide more informative details.

      Additionally, ensure that figure descriptions go beyond the captions and provide detailed explanations that help the reader understand the significance of each figure.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      Figure 3A: We clustered the samples based on CS1 and CS2 subtypes and displayed the immune-related cell scores of each sample as a heatmap.

      Figure 4A: The blue bars in the figure represent the average C-index of this algorithm combination in the training dataset TCGA and the validation dataset ICGC, which we have supplemented in the corresponding sections of the text.

      Figure 4B: We described this figure in the results section, which primarily aims to validate whether our prognostic prediction model can predict patient outcomes in the TCGA cohort. The results showed that after performing prognostic risk scoring on patients based on the prediction model and categorizing them into high-risk and low-risk groups, the two groups exhibited significant prognostic differences, with the high-risk group showing worse outcomes compared to the low-risk group. This indicates that our prognostic prediction model can effectively distinguish the prognostic risk differences among patients in the TCGA-LIHC cohort. We also discussed these findings in the discussion section.

      Figure 7C: We used both point color and size to visualize the levels of metabolic scores, resulting in two dimensions in the legend, which actually represent the same information. Therefore, we removed the results that used point size to indicate the levels of metabolic scores.

      (6) Supplementary Materials:

      Consider including more detailed supplementary materials that provide additional validation data, extended methodological descriptions, and any other information that would support the robustness of your findings.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      In the subsequent version of the record, we will upload the important results obtained during the research to GitHub, and in this revision, we have updated some figures that may better explain the results or the robustness of the findings as supplementary materials.

      (7) Recent Literature:

      a) Incorporate more recent studies in your discussion, especially those related to HCC subtypes and the application of machine learning in oncology. This will provide a more current context for your work and help position your findings within the broader field.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      We have reviewed several studies related to HCC subtype classification and the application of machine learning in this field. In the discussion section, we summarize the significance and limitations of these studies. Additionally, we discuss the characteristics of our study in comparison to previous research in this field.

      (8) Data and Code Availability:

      Ensure that all data, code, and materials used in your study are made available in line with eLife's policies. Provide clear links to repositories where readers can access the data and code used in your analyses.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      We have examined the relevant data, code, and materials. We confirm that we have indicated the sources of the data and tools used in the analysis within the manuscript. Moreover, these data and tools are accessible via the websites or references we have provided.

      Reviewer #2 (Recommendations for the authors):

      (1) While the computational findings are robust, further experimental validation of the two subtypes, particularly the role of the MIF signaling pathway, would strengthen the biological relevance of the findings. In vitro or in vivo validation could confirm the proposed mechanisms and their influence on patient prognosis.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      We intend to verify our findings in future studies using tumor cell line models and animal models. We aim to identify and intervene with key molecules in the MIF signaling pathway. We will investigate how the MIF signaling pathway affects tumor sensitivity to treatment in both cell line and animal models, along with the underlying mechanisms.

      (2) Consider testing the model on additional independent cohorts beyond the TCGA and ICGC datasets to further demonstrate its generalizability and applicability across different patient populations.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      We analyzed the GSE14520 study recorded in the GEO database, which uploaded a cohort consisting of 209 HCC patients and their corresponding RNA sequencing data. We validated the prognostic model obtained in this study using this cohort, and found that the model effectively distinguishes patients into high-risk and low-risk prognostic categories. Furthermore, there is a significant prognostic difference between the high-risk and low-risk patient groups. This is consistent with the results we obtained previously.

      (3) Review the manuscript for long or complex sentences, which can be broken down into shorter, more readable parts.

      We have made revisions to the long and complex sentences in the manuscript without compromising its academic integrity and rationality, with the hope that this will help readers better understand the content of this study.

      During the revision process, in addition to addressing the reviewer comments, we conducted a thorough review of the analysis. In the course of this review, we identified a few errors in the data usage and have since corrected the relevant data and figures:

      Figure 4: Due to space constraints, we adjusted the composition of the figures after incorporating the validation results from the GSE14520 dataset.

      Figure 5A: We rechecked the regression coefficients included in the model, updated several more recent prognostic models, and calculated the C-index for 20 prognostic models in the TCGA and ICGC cohorts using a method consistent with previous studies.

      Figure 5C-D: We adjusted the clarity of the figures.

      Figure 8: We reclassified the selected malignant cells and updated the subtypes results. Subsequently, based on the repeatedly confirmed typing results, we comprehensively updated the analysis results of the subsequent cell communication network construction, ensuring that the entire analysis process remains consistent with previous findings. We also adjusted the composition of the figure and presented the images that could not be conveniently merged due to space constraints as Figure 9.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      This manuscript describes a series of experiments documenting trophic egg production in a species of harvester ant, Pogonomyrmex rugosus. In brief, queens are the primary trophic egg producers, there is seasonality and periodicity to trophic egg production, trophic eggs differ in many basic dimensions and contents relative to reproductive eggs, and diets supplemented with trophic eggs had an effect on the queen/worker ratio produced (increasing worker production).

      The manuscript is very well prepared and the methods are sufficient. The outcomes are interesting and help fill gaps in knowledge, both on ants as well as insects, more generally. More context could enrich the study and flow could be improved.

      We thank the reviewer for these comments. We agree that the paper would benefit from more context. We have therefore greatly extended the introduction.

      Reviewer #2 (Public Review):

      The manuscript by Genzoni et al. provides evidence that trophic eggs laid by the queen in the ant Pogonomyrmex rugosis have an inhibitory effect on queen development. The authors also compare a number of features of trophic eggs, including protein, DNA, RNA, and miRNA content, to reproductive eggs. To support their argument that trophic eggs have an inhibitory effect on queen development, the authors show that trophic eggs have a lower content of protein, triglycerides, glycogen, and glucose than reproductive eggs, and that their miRNA distributions are different relative to reproductive eggs. Although the finding of an inhibitory influence of trophic eggs on queen development is indeed arresting, the egg cross-fostering experiment that supports this finding can be effectively boiled down to a single figure (Figure 6). The rest of the data are supplementary and correlative in nature (and can be combined), especially the miRNA differences shown between trophic and reproductive eggs. This means that the authors have not yet identified the mechanism through which the inhibitory effect on queen development is occurring. To this reviewer, this finding is more appropriate as a short report and not a research article. A full research article would be warranted if the authors had identified the mechanism underlying the inhibitory effect on queen development. Furthermore, the article is written poorly and lacks much background information necessary for the general reader to properly evaluate the robustness of the conclusions and to appreciate the significance of the findings.

      We thank the reviewer for these comments. We agree that the paper would benefit by having more background information and more discussion. We have followed this advice in the revision.

      Reviewer #3 (Public Review):

      In "Trophic eggs affect caste determination in the ant Pogonomyrmex rugosus" Genzoni et al. probe a fundamental question in sociobiology, what are the molecular and developmental processes governing caste determination? In many social insect lineages, caste determination is a major ontogenetic milestone that establishes the discrete queen and worker life histories that make up the fundamental units of their colonies. Over the last century, mechanisms of caste determination, particularly regulators of caste during development, have remained relatively elusive. Here, Genzoni et al. discovered an unexpected role for trophic eggs in suppressing queen development - where bi-potential larvae fed trophic eggs become significantly more likely to develop into workers instead of gynes (new queens). These results are unexpected, and potentially paradigm-shifting, given that previously trophic eggs have been hypothesized to evolve to act as an additional intracolony resource for colonies in potentially competitive environments or during specific times in colony ontogeny (colony foundation), where additional food sources independent of foraging would be beneficial. While the evidence and methods used are compelling (e.g., the sequence of reproductive vs. trophic egg deposition by single queens, which highlights that the production of trophic eggs is tightly regulated), the connective tissue linking many experiments is missing and the downstream mechanism is speculative (e.g., whether miRNA, proteins, triglycerides, glycogen levels in trophic eggs is what suppresses queen development). Overall, this research elevates the importance of trophic eggs in regulating queen and worker development but how this is achieved remains unknown.

      We thank the reviewer for these comments and agree that future work should focus on identifying the substances in trophic eggs that are responsible for caste determination.  

      Reviewer #1 (Recommendations For The Authors):

      Introduction:

      The context for this study is insufficiently developed in the introduction - it would be nice to have a more detailed survey of what is known about trophic eggs in insects, especially social insects. The end of the introduction nicely sets up the hypothesis through the prior work described by Helms Cahan et al. (2011) where they found JH supplementation increased trophic egg production and also increased worker size. I think that the introduction could give more context about egg production in Pogonomyrmex and other ants, including what is known about worker reproduction. For example, Suni et al. 2007 and Smith et al. 2007 both describe the absence of male production by workers in two different harvester ants. Workers tend to have underdeveloped ovaries when in the presence of the queen. Other species of ants are known to have worker reproduction seemingly for the purpose of nutrition (see Heinze and Hölldober 1995 and subsequent studies on Crematogaster smithi). Because some ants, including Pogonomyrmex, lack trophallaxis, it has been hypothesized that they distribute nutrients throughout the nest via trophic eggs as is seen in at least one other ant (Gobin and Ito 2000). Interestingly, Smith and Suarez (2009) speculated that the difference in nutrition of developing sexual versus worker larvae (as seen in their pupal stable isotope values) was due to trophic egg provisioning - they predicted the opposite as was found in this study, but their prediction was in line with that of Helms Cahan et al. (2011). This is all to say that there is a lot of context that could go into developing the ideas tested in this paper that is completely overlooked. The inclusion of more of what is known already would greatly enrich the introduction.

      We agree that it would be useful to provide a larger context to the study. We now provide more information on the life-history of ants and explained under what situations queens and workers may produce trophic eggs. We also mentioned that some ants such as Crematogaster smithi have a special caste of “large workers” which are morphologically intermediate between winged queens and small workers and appear to be specialized in the production of unfertilized eggs. We now also mention the study of Goby and Ito (200) where the authors show that trophic eggs may play an important role in food distribution withing the colony, in particular in species where trophallaxis is rare or absent.

      Methods:

      L49: What lineage is represented in the colonies used? The collection location is near where both dependent-lineage (genetic caste determining) P. rugosus and "H" lineage exist. This is important to know. Further, depending on what these are, the authors should note whether this has relevance to the study. Not mentioning genetic caste determination in a paper that examines caste determination is problematic.

      This is a good point. We have now provided information at the very beginning of the material and method section that the queens had been collected in populations known not to have dependentlineage (genetic caste determining) mechanisms of caste determination.

      L63 and throughout: It would be more efficient to have a paragraph that cites R (must be done) and RStudio once as the tool for all analyses. It also seems that most model construction and testing was done using lme4 - so just lay this out once instead of over and over.

      We agree and have updated the manuscript accordingly.

      L95: 'lenght' needs to be 'length' in the formula.

      Thanks, corrected.

      L151: A PCA was used but not described in the methods. This should be covered here. And while a Mantel test is used, I might consider a permANOVA as this more intuitively (for me, at least) goes along with the PCA.

      We added the PCA description in the Material and Method section.

      Results:

      I love Fig. 3! Super cool.

      Thanks for this positive comment.

      Discussion:

      It would be good to have more on egg cannibalism. This is reasonably well-studied and could be good extra context.

      We have added a paragraph in the discussion to mention that egg cannibalism is ubiquitous in ants.

      Supp Table 1: P. badius is missing and citations are incorrectly attributed to P. barbatus.

      P. badius was present in the Table but not with the other Pogonomyrmex species. For some genera the species were also not listed in alphabetic order. This has been corrected.

      Reviewer #2 (Recommendations For The Authors):

      COMMENTS ON INTRODUCTION:

      The introduction is missing information about caste determination in ants generally and Pogonomyrmex rugosis specifically. This is important because some colonies of Pogonomyrmex rugosis have been shown to undergo genetic caste determination, in which case the main result would be rendered insignificant. What is the evidence that caste determination in the lineages/colonies used is largely environmentally influenced and in what contexts/environmental factors? All of this should be made clear.

      This is a good point. We have expanded the introduction to discuss previous work on caste determination in Pogonomyrmex species with environmental caste determination and now also provide evidence at the beginning of the Material and Method section that the two populations studied do not have a system of genetic caste determination.

      Line 32 and throughout the paper: What is meant exactly by 'reproductive eggs'? Are these eggs that develop specifically into reproductives (i.e., queens/males) or all eggs that are non-trophic? If the latter, then it is best to refer to these eggs as 'viable' in order to prevent confusion.

      We agree and have updated the manuscript accordingly.

      Figure 1/Supp Table 1: It is surprising how few species are known to lay trophic eggs. Do the authors think this is an informative representation of the distribution of trophic egg production across subfamilies, or due to lack of study? Furthermore, the branches show ant subfamilies, not families. What does the question mark indicate? Also, the information in the table next to the phylogeny is not easy to understand. Having in the branches that information, in categories, shown in color for example, could be better and more informative. Finally, having the 'none' column with only one entry is confusing - discuss that only one species has been shown to definitely not lay trophic eggs in the text, but it does not add much to the figure.

      Trophic eggs are probably very common in ants, but this has not been very well studied. We added a sentence in the manuscript to make this clear.

      Thanks for noticing the error family/subfamily error. This has been corrected in Figure 1 and Supplementary Table 1.

      The question mark indicates uncertainty about whether queens also contribute to the production of trophic eggs in one species (Lasius niger). We have now added information on that in the Figure legend.

      We agree with the reviewer that it would be easier to have the information on whether queens and workers produce trophic on the branches of the Tree. However, having the information on the branches would suggest that the “trait” evolved on this part of the tree. As we do not know when worker or queen production of trophic eggs exactly evolved, we prefer to keep the figure as it is.

      Finally, we have also removed the none in the figure as suggested by the reviewer and discussed in the manuscript the fact that the absence of trophic eggs has been reported in only one ant species (Amblyopone silvestrii: Masuko 2003).

      COMMENTS ON MATERIALS AND METHODS:

      Why did they settle on three trophic eggs per larva for their experimental setup?

      We used three trophic eggs because under natural conditions 50-65% of the eggs are trophic. The ratio of trophic eggs to viable eggs (larvae) was thus similar natural condition.

      Line 50: In what kind of setup were the ants kept? Plaster nests? Plastic boxes? Tubes? Was the setup dry or moist? I think this information is important to know in the context of trophic eggs.

      We now explain that colonies were maintained in plastic boxes with water tubes.

      Line 60: Were all the 43 queens isolated only once, or multiple times?

      Each of the 43 queens were isolated for 8 hours every day for 2 weeks, once before and once after hibernation (so they were isolated multiple times). We have changed the text to make clear that this was done for each of the 43 queens.

      Could isolating the queen away from workers/brood have had an effect on the type of eggs laid?

      This cannot be completely ruled out. However, it is possible to reliably determine the proportion of viable and trophic eggs only by isolating queens. And importantly the main aim of these experiments was not to precisely determine the proportion viable and trophic eggs, but to show that this proportion changes before and after hibernation and that queens do not lay viable and trophic eggs in a random sequence.

      Since it was established that only queens lay trophic eggs why was the isolation necessary?

      Yes this was necessary because eggs are fragile and very difficult to collect in colonies with workers (as soon as eggs are laid they are piled up and as soon as we disturb the nest, a worker takes them all and runs away with them). Moreover, it is possible that workers preferentially eat one type of eggs thus requiring to remove eggs as soon as queens would have laid them. This would have been a huge disturbance for the colonies.

      Line 61: Is this hibernation natural or lab induced? What is the purpose of it? How long was the hibernation and at what temperature? Where are the references for the requirement of a diapause and its length?

      The hibernation was lab induced. We hibernated the queens because we previously showed that hibernation is important to trigger the production of gynes in P. rugosus colonies in the laboratory (Schwander et al 2008; Libbrecht et al 2013). Hibernation conditions were as described in Libbrecht et al (2013).  

      Line 73: If the queen is disturbed several times for three weeks, which effect does it have on its egg-laying rate and on the eggs laid? Were the eggs equally distributed in time in the recipient colonies with and without trophic eggs to avoid possible effects?

      It is difficult to respond what was the effect of disturbance on the number and type of eggs laid. But again our aim was not to precisely determine these values but determine whether there was an effect of hibernation on the proportion of trophic eggs. The recipient colonies with and without trophic eggs were formed in exactly the same way. No viable eggs were introduced in these colonies, but all first instar larvae have been introduced in the same way, at the same time, and with random assignment. We have clarified this in the Material and Method section.

      Line 77: Before placing the freshly hatched larvae in recipient colonies, how long were the recipient colonies kept without eggs and how long were they fed before giving the eggs? Were they kept long enough without the queen to avoid possible effects of trophic eggs, or too long so that their behavior changed?

      The recipient colonies were created 7 to 10 days before receiving the first larvae and were fed ad libitum with grass seeds, flies and honey water from the beginning. Trophic eggs that would have been left over from the source colony should have been eaten within the first few days after creating the recipient colonies. However, even if some trophic eggs would have remained, this would not influence our conclusion that trophic eggs influence caste fate, given the fully randomized nature of our treatments and the considerable number of independent replicates. The same applies to potential changes in worker behavior following their isolation from the queen.

      Line 77: Is it known at what stage caste determination occurs in this species? Here first instar larvae were given trophic eggs or not. Does caste-determination occur at the first instar stage? If not, what effect could providing trophic eggs at other stages have on caste-determination?

      A previous study showed that there is a maternal effect on caste determination in the focal species (Schwander et al 2008). The mechanism underlying this maternal effect was hypothesized to be differential maternal provisioning of viable eggs. However, as we detail in the discussion, the new data presented in our study suggests that the mechanism is in fact a different abundance of trophic eggs laid by queens. There is currently no information when exactly caste determination occurs during development

      COMMENTS ON RESULTS:

      Line 65: How does investigating the order of eggs laid help to "inform on the mechanisms of oogenesis"?

      We agree that the aim was not to study the mechanism of oogenesis. We have changed this sentence accordingly: “To assess whether viable and trophic eggs were laid in a random order, or whether eggs of a given type were laid in clusters, we isolated 11 queens for 10 hours, eight times over three weeks, and collected every hour the eggs laid”

      Figure 2: There is no description/discussion of data shown in panels B, C, E, and F in the main text.

      We have added information in the main text that while viable eggs showed embryonic development at 25 and 65 hours (Fig 12 B, C) there was no such development for trophic eggs (Fig. 2 E,F).

      Line 172: Please explain hibernation details and its significance on colony development/life cycle.

      We have added this information in the Material and Method section.

      Figure 6: How is B plotted? How could 0% of gynes have 100% survival?

      The survival is given for the larvae without considering caste. We have changed the de X axis of panel B and reworded the Figure legend to clarify this.

      Is reduced DNA content just an outcome of reduced cell number within trophic eggs, i.e., was this a difference in cell type or cell number? Or is it some other adaptive reason?

      It is likely to be due to a reduction in cell number (trophic eggs have maternal DNA in the chorion, while viable eggs have in addition the cells from the developing zygote) but we do not have data to make this point.

      Is there a logical sequence to the sequence of egg production? The authors showed that the sequence is non-random, but can they identify in what way? What would the biological significance be?

      We could not identify a logical sequence. Plausibly, the production of the two types of eggs implies some changes in the metabolic processes during egg production resulting in queens producing batches of either viable or trophic eggs. This would be an interesting question to study, but this is beyond the scope of this paper.

      Figure 6b is difficult to follow, and more generally, legends for all figures can be made clearer and more easy to follow.

      We agree. We have now improved the legends of Fig 6B and the other figures.

      Lines 172-174: "The percentage of eggs that were trophic was higher before hibernation...than after. This higher percentage was due to a reduced number of reproductive eggs, the number of trophic eggs laid remained stable" - are these data shown? It would be nice to see how the total egglaying rate changes after hibernation. Also, is the proportion of trophic eggs laid similar between individual queens?

      No the data were not shown and we do not have excellent data to make this point. We have therefore removed the sentence “This higher percentage was due to a reduced number of reproductive eggs, the number of trophic eggs laid remained stable” from the manuscript.

      Figure 6B: Do several colonies produce 100% gynes despite receiving trophic eggs? It would be interesting if the authors discussed why this might occur (e.g., the larvae are already fully determined to be queens and not responsive to whatever signal is in the trophic eggs).

      The reviewer is correct that 4 colonies produced 100% gynes despite receiving trophic eggs. However, the number of individuals produced in these four colonies was small (2,1,2,1, see supplementary Table 2). So, it is likely that it is just by chance that these colonies produced only gynes.

      Figure 5: Why a separation by "size distribution variation of miRNA"? What is the relevance of looking at size distributions as opposed to levels?

      We did that because there many different miRNA species, reflected by the fact that there is not just one size peak but multiple one. This is why we looked at size distribution

      Figure 2: The image of the viable embryo is not clear. If possible, redo the viable to show better quality images.

      Unfortunately, we do not anymore have colonies in the laboratory so this is not possible.

      COMMENTS ON DISCUSSION:

      Lines 236-247: Can an explanation be provided as to why the effect of trophic eggs in P. rugosus is the opposite of those observed by studies referenced in this section? Could P. rugosus have any life history traits that might explain this observation?

      In the two mentioned studies there were other factors that co-varied with variation in the quantity of trophic eggs. We mentioned that and suggested that it would be useful to conduct experimental manipulation of the quantity of trophic eggs in the Argentine ant and P. barbatus (the two species where an effect of trophic eggs had been suggested).

      The discussion should include implications and future research of the discovery.

      We made some suggestions of experiments that should be performed in the future

      The conclusion paragraph is too short and does not represent what was discussed.

      We added two sentences at the end of the paragraph to make suggestions of future studies that could be performed.

      Lines 231 to 247: Drastically reduce and move this whole part to the introduction to substantiate the assumption that trophic eggs play a nutritional role.

      We moved most of this paragraph to the introduction, as suggested by the reviewer.

      Reviewer #3 (Recommendations For The Authors):

      I would like to commend the authors on their study. The main findings of the paper are individually solid and provide novel insight into caste determination and the nature of trophic eggs. However, the inferences made from much of the data and connections between independent lines of evidence often extend too far and are unsubstantiated.

      We thank the reviewer for the positive comment. We made many changes in the manuscript to improve the discussion of our results.

    1. Author response:

      We thank the editors and the reviewers for their valuable comments. In response to these suggestions, we will add rigorous statistical measures and extend the experimental support of our findings in a revised version. Indeed, as we will show, doing so strengthens all the main claims. Specifically:

      Concerning Reviewer 1:

      - It is important to emphasise that the advantage of deriving shape measures q<sub>p</sub> from Minkowski tensors is their robustness and stability, that is well-established from extensive, rigorous mathematical analyses. Introducing q<sub>p</sub> without this connection to revised Minkowski tensors would not allow to claim this stability property for the considered measures.

      - Even though for a polygon the vertex positions contain the whole geometric information, using q<sub>p</sub> and γ<sub>p</sub> lead to different results, see Fig. 6 for an example.

      - We wholeheartedly agree that our statement on independence of values of q<sub>2</sub> and q<sub>6</sub> can be extended and more quantitatively established by rigorous statistical measures. This is exactly what we will do in the revised version, not only providing statistical measures on the presented data, but also extending our analyses to the published data from Armengol-Collado JM, Carenza LN, Eckert J, Krommydas D, Giomi L. Epithelia are multiscale active liquid crystals. Nature Physics. 2023; 19:1773–1779. As we shall show these analyses further strengthen this claim, unequivocally establishing the independence of q<sub>2</sub> and q<sub>6</sub> in two different models (active vertex model and multiphase-field model), as well as two different sets of experiments (the ones in the original manuscript, and the published one from Armengol-Collado JM, Carenza LN, Eckert J, Krommydas D, Giomi L. Epithelia are multiscale active liquid crystals. Nature Physics. 2023; 19:1773–1779).

      Concerning Reviewer 2:

      To fully address this point, we have extended our analyses to explore the published data of Armengol-Collado JM, Carenza LN, Eckert J, Krommydas D, Giomi L. Epithelia are multiscale active liquid crystals. Nature Physics. 2023; 19:1773–1779. As we shall show in the revised manuscript, the crossover between nematic and hexatic is only specific to the use of γ<sub>p</sub> for characterizing the shape and coarse-graining of the associated order. Using q<sub>p</sub> as the shape measure this crossover disappears. Therefore, this analyses concretely demonstrate that the crossover is not a robust physical feature of the system and is dependent on the method used to define shape characteristics.

      Concerning Reviewer 3:

      We respectfully note a misunderstanding from the referee: The briefly mentioned approaches of other groups, turn out to be not measuring shape but connections between cells. Conceptually these approaches are therefore related to bond order parameters. We already comment at the end of the section introducing Minkowski tensors that bond order parameters cannot quantify the shape of a cell. The same argumentation also holds for other such approaches. In our revised version we will further clarify this distinction, to avoid any confusion or misinterpretation.

    1. Author response:

      As a short response to the public reviews, we would like to outline the following planned revisions:

      (1) Address the antibody concerns as indicated by reviewer 1

      (2) Assess the role of tensin (and possibly KANK), as suggested by reviewers 2 and 3, respectively.

      (3) Validate our main experimental findings using alternative super-resolution approaches, including STED to avoid potential blinking artefacts associated to standard STORM, and most possibly DNA-PAINT as a more quantitative technique, as suggested by reviewer 3.

      (4) Implement alternative analytical strategies to DBSCAN, including Voronoi tessellation as suggested by reviewer 3.

      (5) Expanded discussion on the main findings of our work and biological significance.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In the manuscript entitled "Rtf1 HMD domain facilitates global histone H2B monoubiquitination and regulates morphogenesis and virulence in the meningitis-causing pathogen Cryptococcus neoformans" by Jiang et al., the authors employ a combination of molecular genetics and biochemical approaches, along with phenotypic evaluations and animal models, to identify the conserved subunit of the Paf1 complex (Paf1C), Rtf1, and functionally characterize its critical roles in mediating H2B monoubiquitination (H2Bub1) and the consequent regulation of gene expression, fungal development, and virulence traits in C. deneoformans or C. neoformans. Specially, the authors found that the histone modification domain (HMD) of Rtf1 is sufficient to promote H2B monoubiquitination (H2Bub1) and the expression of genes related to fungal mating and filamentation, and restores the fungal morphogenesis and pathogenicity defects caused by RTF1 deletion.

      Strengths:

      The manuscript is well-written and presents the findings in a clear manner. The findings are interesting and contribute to a better understanding of Rtf1-mediated epigenetic regulation of fungal morphogenesis and pathogenicity in a major human fungal pathogen, and potentially in other fungal species, as well.

      Weaknesses:

      A major limitation of this study is the absence of genome-wide information on Rtf1-mediated H2B monoubiquitination (H2Bub1), as well as a lack of detail regarding the function of the Plus3 domain. Although overexpression of HMD in the rtf1Δ mutant restored global H2Bub1 levels, it did not rescue certain critical biological functions, such as growth at 39 °C and melanin production (Figure 4C-D). This suggests that the precise positioning of H2Bub1 is essential for Rtf1's function. A comprehensive epigenetic landscape of H2Bub1 in the presence of HMD or full-length Rtf1 would elucidate potential mechanisms and shed light on the function of the Plus3 domain.

      We thank the reviewer (and other reviewers) for this excellent suggestion. We have conducted CUT&Tag assays with WT, _rtf1_Δ mutant, and complementary strains with the full length Rtf1 and only HMD domain cultured under 30 and 39 °C. We indeed found that the epigenetic landscape of H2Bub1 in the presence of HMD or full-length Rtf1 has variations. This results strongly suggest that the distribution of H2Bub1 is regulated by Rtf1, and H2B modifications at specific loci in the chromosome may contribute to thermal tolerance in C. neoformans. These new findings from CUT&Tag assays shed lights on understanding the mechanism of thermal tolerance, and we decided not to include these results in the current manuscript.

      Reviewer #2 (Public Review):

      Summary:

      The authors set out to determine the role of Rtf1 in Cryptococcal biology, and demonstrate that Rtf1 acts independently of the Paf1 complex to exert regulation of Histone H2B monoubiquitylation (H2Bub1). The biological impact of the loss of H2Bub1 was observed in defects in morphogenesis, reduced production of virulence factors, and reduced pathogenic potential in animal models of cryptococcal infection.

      Strengths:

      The molecular data is quite compelling, demonstrating that the Rtf1-depednent functions require only this histone modifying domain of Rtf1, and are dependent on nuclear localization. A specific point mutation in a residue conserved with the Rtf1 protein in the model yeast demonstrates the conservation of that residue in H2Bub1 modification. Interestingly, whereas expression of the HMD alone suppressed the virulence defect of the rtf1 deletion mutant, it did not suppress defects in virulence factor production.

      Weaknesses:

      The authors use two different species of Cryptococcus to investigate the biological effect of Rtf1 deletion. The work on morphogenesis utilized C. deneoformans, which is well-known to be a robust mating strain. The virulence work was performed in the C. neoformans H99 background, which is a highly pathogenic isolate. The study would be more complete if each of these processes were assessed in the other strain to understand if these biological effects are conserved across the two species of Cryptococcus. H99 is not as robust in morphogenesis, but reproducible results assessing mating and filamentation in this strain have been performed. Similarly, C. deneoformans does produce capsule and melanin.

      We thank the reviewer for the suggestion. We have conducted assays to quantify both capsule and melanin production in both C. neoformans and C. deneoformans strain background. We found that capsule production was affected in the same pattern in these two serotypes. Interestingly, we found the cell size was significantly affected by deletion of RTF1 in both serotypes. In addition, melanin production was reduced due to the deletion of RTF1 in both serotypes; However, complementation with Plus3 or mutated alleles of HMD gave different phenotypes in these two serotypes. These new findings were included Figure 4 in the revised manuscript.

      There are some concerns with the conclusions related to capsule induction. The images reported in Figure B are purported to be grown under capsule-inducing conditions, yet the H99 panel is not representative of the induced capsule for this strain. Given the lack of a baseline of induction, it is difficult to determine if any of the strains may be defective in capsule induction. Quantification of a population of cells with replicates will also help to visualize the capsular diversity in each strain population.

      We thank the reviewer for raising this concern. We have tested capsule production under capsule-inducing condition on 10% fetal bovine serum (FBS) agar medium [1]. Under this condition, the capsule layers surrounding the cells were obvious. We also included noncapsule-producing control in our assay to help the visualization of capsule. In addition, we quantified the ratio between diameters of capsule layer and cell body to show the capsular diversity in each strain population. The results were included in the Figure 4 in the revised manuscript.

      The authors demonstrate that for specific mating-related genes, the expression of the HMD recapitulated the wild-type expression pattern. The RNA-seq experiments were performed under mating conditions, suggesting specificity under this condition. The authors raise the point in the discussion that there may be differences in Rtf1 deposition on chromatin in H99, and under conditions of pathogenesis. The data that overexpression of HMD restores H2Bub1 by western is quite compelling, but does not address at which promoters H2Bub1 is modulating expression under pathogenesis conditions, and when full-length Rtf1 is present vs. only the HMD.

      We thank the reviewer for raising these concerns. Please see our response to Reviewer #1.

      Reviewer #3 (Public Review):

      Summary:

      In this very comprehensive study, the authors examine the effects of deletion and mutation of the Paf1C protein Rtf1 gene on chromatin structure, filamentation, and virulence in Cryptococcus.

      Strengths:

      The experiments are well presented and the interpretation of the data is convincing.

      Weaknesses:

      Yet, one can be frustrated by the lack of experiments that attempt to directly correlate the change in chromatin structure with the expression of a particular gene and the observed phenotype. For example, the authors observed a strong defect in the expression of ZNF2, a known regulator of filamentation, mating, and virulence, in the rtf1 mutant. Can this defect explain the observed phenotypes associated with the RTF1 mutation? Is the observed defect in melanin production associated with altered expression of laccase genes and altered chromatin structure at this locus?

      We completely agree with the reviewer. We have conducted CUT&Tag assay, and checked the Rtf1-mediated H2Bub1 at these particular gene loci. We found that the distribution of H2Bub1 at the promoter region of ZNF2 and the gene body of laccase-encoding gene varied possibly due to RTF1 mutation. We would like to save those preliminary findings for another story and not to include in this manuscript as we mentioned in the response to Reviewer #1.

      (1) Jang, E.-H., et al., Unraveling Capsule Biosynthesis and Signaling Networks in Cryptococcus neoformans. Microbiology Spectrum, 2022. 10(6): p. e02866-22.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors show for the first time that deleting GLS from rod photoreceptors results in the rapid death of these cells. The death of photoreceptor cells could result from loss of synaptic activity because of a decrease in glutamate, as has been shown in neurons, changes in redox balance, or nutrient deprivation.

      Strengths:

      The strength of this manuscript is that the author shows a similar phenotype in the mice when Gls was knocked out early in rod development or the adult rod. They showed that rapid cell death is through apoptosis, and there is an increase in the expression of genes responsive to oxidative stress.

      We thank the reviewer for their time reviewing the manuscript and their comments regarding the potential mechanism(s) by which rod photoreceptors rapidly degenerate upon knockout of GLS.

      Weaknesses:

      In this manuscript, the authors show a "metabolic dependency of photoreceptors on glutamine catabolism in vivo". However, there is a potential bias in their thinking that glutamine metabolism in rods is similar to cancer cells where it feeds into the TCA cycle. They should consider that as in neurons, GLS1 activity provides glutamate for synaptic transmission. The modest rescue shown by providing α-ketoglutarate in the drinking water suggests that glutamine isn't a key metabolic substrate for rods when glucose is plentiful. The ERG studies performed on the iCre-Glsflox/flox mice showed a large decrease in the scotopic b wave at saturating flashes which could indicate a decrease in glutamate at the rod synapse as stated by the authors. While EM micrographs of wt and iCre-Glsflox/flox mice were shown for the outer retina at p14, the synapse of the rods needs to be examined by EM.

      We agree with the reviewer that in the presence of sufficient glucose, it appears a lack of GLS-driven glutamine (Gln) catabolism does not drastically alter the levels of TCA cycle metabolites or mitochondrial function as we demonstrated in Figure 4, and supplementation with alpha-ketoglutarate improved outer nuclear layer thickness by only a small amount as observed in Figure 5e. Hence, as we stated in the Results and Discussion, at least in the mouse where Gls is selectively deleted from rod photoreceptors by crossing Gls<sup>fl/fl</sup> mice with Rho-Cre mice (Gls<sup>fl/fl</sup>; Rho-Cre<sup>+</sup>, cKO), Gln’s role in supporting the TCA cycle is not the major mechanism by which rod photoreceptors utilize Gln to suppress apoptosis.

      With regards to GLS-driven Gln catabolism providing glutamate (Glu) for synaptic transmission, we again agree with the reviewer that Glu is an important excitatory neurotransmitter, but it is also a key metabolite necessary for the synthesis of glutathione, amino acids, and proteins. As noted and discussed at length in the manuscript, a lack of GLS-driven Gln catabolism in rod photoreceptors leads to reduced levels of oxidized glutathione (Figure 4D) possibly signaling an overall reduction in the biosynthesis of glutathione as Glu is directly and indirectly responsible for its synthesis. Furthermore, Gln and GLS-derived Glu play a central role in the biosynthesis of several nonessential amino acids and proteins. To this end, we see a reduction in the level of Glu, which is the product of the GLS reaction and further confirms the loss of GLS function. We also noted a significant decrease in aspartate (Asp), which can be constructed from the carbons and nitrogens of Gln as discussed at length in the manuscript (Figure 6A). Finally, we noted a significant decrease in global protein synthesis in the cKO retina as compared to the wild-type animal as well (Figure 6E). Therefore, the data suggest that GLS-driven Gln catabolism is critical for amino acid metabolism and protein synthesis and to some degree redox balance; although, the small but statistically significant changes in oxidized glutathione, NADP/NADPH, and redox gene expression may not fully account for the rapid and complete photoreceptor degeneration observed. Future studies are necessary to shed light on the role of redox imbalance in this novel transgenic mouse model.

      Glu also plays a role in synaptic transmission, and we considered this scenario as described in Figure 1 – figure supplement 5. Here, the synaptic connectivity between photoreceptors and the inner retina did not demonstrate significant differences in the labeling of photoreceptor synaptic membranes in the outer plexiform layer nor alterations in the labeling of a key protein (Bassoon) in ribbon synapses. These data suggest that the synaptic connectivity between photoreceptors and second-order neurons was unaltered at P14 in the cKO retina, which is the time just prior to rapid photoreceptor degeneration when Glu was shown to be decreased (Figure 6A).

      With regards to the ERG changes noted in Figure 2, we agree with the reviewer that a large decrease was noted in the scotopic b-wave at P21 and P42 in the cKO. We also agree, that to obtain greater insight into these ERG changes, the ribbon synapse in EM images can be examined. The EM images shown in Figure 1 – figure supplement 4 are from P21, which coincide with the age at which the ERG changes were first noted and when significant photoreceptor degeneration has already occurred. These images were utilized to assess the ribbon synapse for the revised version of the manuscript. As now shown in Figure 1 – figure supplement 4D, ribbon synapses are intact in WT animals as denoted by the yellow boxes. Similarly, the ribbons (yellow arrows) appear structurally intact in the photoreceptors that remain in the P21 cKO retina. These results are in accordance with the lack of significant differences in the labeling of photoreceptor synaptic membranes in the outer plexiform layer as well as the lack of alterations in the labeling of a key protein (Bassoon) in ribbon synapses (Figure 1-figure supplement 5A and B).  While we cannot fully rule out that the decrease in glutamate is altering synaptic transmission, our structural data suggests the synapses remain intact. These data have been added to the revised manuscript.

      However, an even larger reduction in the scotopic a-wave was noted at these ages as well. In animal models that disrupt photoreceptor synaptic function (Dick et al. Neuron. 2003; Johnson et al. J Neuroscience. 2007; Haeseleer et al. Nature Neuroscience. 2004; Chang et al. Vis Neurosci. 2006), a more negative ERG pattern is typically observed with the b-wave altered to a much larger degree than the a-wave. Additionally, in these models that disrupt photoreceptor synaptic transmission, the overall structure of the retina with respect to thickness is maintained (Dick et al. Neuron. 2003) or noted to have modest changes in the outer plexiform layer within the first two months of age with the outer nuclear layer not significantly altered until 8-10 months of age (Haeseleer et al. Nature Neuroscience. 2004). In contrast, a rapid decline in the outer nuclear layer thickness was observed in the cKO retina after P14 likely contributing to the ERG changes noted in Figure 2. Also, Gln is catabolized to Glu primarily by GLS as suggested by the approximately 50% reduction in Glu levels in the cKO retina (Figure 6A), but other enzymes are also capable of catabolizing Gln to Glu, so Glu levels in the rod photoreceptors are unlikely to be zero. Coupling this with the fact that rods are equipped with a self-sufficient Glu recollecting system at their synaptic terminals (Hasegawa et al. Neuron. 2006; Winkler et al. Vis Neurosci. 1999) and that GLS activity is at least two-fold higher in the photoreceptor inner segments, which support energy production and metabolism, than any other layer in the retina (Ross et al. Brain Res. 1987) suggests that altered synaptic transmission secondary to reduced levels of Glu likely does not account in full for the rapid and robust photoreceptor degeneration observed in the cKO retina.

      The authors note that the outer segments are shorter but they do not address whether there is a decrease in the number of cones.

      We have adjusted Figure 2E by removing the GLS staining to better highlight the secondary degeneration of cone outer segments, the main point of the Figure, as we had already shown that GLS was cleanly knocked out of rod photoreceptors in Figure 1. Furthermore, qualitatively the number of cones appears the same at P14, P21, and P42 between the WT and cKO, which is consistent with other retinal degeneration models, like rd1 and rd10, where cones do not begin to die until all the rods have degenerated (Xue et al. eLife. 2021).

      Rod-specific Gls ko mice with an inducible promoter were generated by crossing the Pde6g-CreERT2 and homozygous for either the WT or floxed Gls allele (IND-cKO). In Figure 3 the authors document that by western blots and antibody labeling the GLS1 expression is lost in the IND-cKO 10 days post tamoxifen. OCT images show a decrease in the thickness of the outer nuclear layer between 17 and 38 days post-TAM. Ergs should be performed on the animals at 10 and 30 days post TAM, before and after major structural changes in rod photoreceptor cells, to determine if changes in light-stimulated responses are observed. These studies could help to parse out the cause of photoreceptor cell death.

      We agree with the reviewer that the IND-cKO is a useful tool to help parse out the cause of photoreceptor cell death in this model as well as shed light on the role of GLS-driven Gln catabolism in photoreceptor synaptic transmission as discussed at length above. Hence, ERG analyses were performed 10 days post TAM, before major structural changes in the ONL are observed. Interestingly, ERG demonstrated statistically significant reductions in the IND-cKO scotopic a- and b-waves as compared to the WT 10 days post TAM. Similarly, photopic ERG demonstrated statistically significant decreases in the b-wave of the IND-cKO retina. These data suggest that GLS-driven Gln catabolism plays a significant role not only in rod photoreceptor survival but their function as well. This data has been added to Figure 3H-I and discussed in the corresponding manuscript text.

      To this end, as discussed below and added to Figure 6 – figure supplement 1, amino acid levels, including glutamate (Glu), are already reduced 10 days post TAM. Reductions in the level of Glu may impact synaptic transmission and as a result, the scotopic b-wave. However, as noted above, altered synaptic transmission secondary to reduced levels of Glu likely does not account in full for the rapid and robust photoreceptor degeneration observed in the cKO retina as the b-wave to a-wave ratio is not significantly altered in the IND-cKO retina as compared to the WT retina, suggesting GLS-driven Gln catabolism is impairing both to a similar degree.

      Additionally, Pde6g is expressed by rods to a significant degree but also by cones (GSE63473, scRNAseq data). Therefore, the IND-cKO mouse likely knocks out GLS from both rods and cones, which is in accordance with the immunofluorescence image in Figure 3B where GLS is not observed in rod or cone inner segments unlike in Figure 1B where GLS remains in cones. Hence, the reduction in photopic b-wave may be demonstrating that GLS-driven Gln catabolism in cones impairs synaptic transmission. As noted in our reply to reviewer #3’s comments, we have generated mice lacking GLS in cone photoreceptors specifically and are currently elucidating the role of GLS in cone photoreceptor metabolism, function, and survival. These results will be published in a separate manuscript.

      The studies in Figure 4 were all performed on iCre-Glsflox/flox and control mice at p14, why weren't the IND-cKO mice used for these studies since the findings would not be confounded by development?

      To gain further insight into the role of GLS-driven Gln catabolism in the maintenance of rod photoreceptors as compared to their development/maturation, we conducted a targeted metabolomic analysis on IND-cKO and WT retinas 10 days post TAM. For the purpose of this manuscript, we have included data regarding changes in amino acid levels in Figure 6 – figure supplement 1. Specifically, levels of glutamate, aspartate and asparagine are all significantly decreased in the IND-cKO retina prior to PR degeneration, which demonstrates that similar to the GLS cKO mouse (i.e. iCre-Gls flox/flox), GLS-driven Gln catabolism is critical for amino acid biosynthesis in mature rod PRs as well.

      In all rescue studies, the endpoint was an ONL thickness, which only addressed rod cell death. The authors should also determine whether there are small improvements in the ERG, which would distinguish the role of GLS in preventing oxidative stress.

      Optical coherence tomography (OCT) provides a sensitive in vivo method to detect small changes in retinal thickness without potential artifacts incurred through histological processing. Considering the Gls cKO retina demonstrates significant and rapid photoreceptor degeneration, we wanted to assess pathways that may be critical to photoreceptor survival downstream of GLS-driven Gln catabolism using rescue experiments with pharmacologic treatment or metabolite supplementation. That said, disruption of GLS-driven Gln catabolism may also significantly alter rod photoreceptor function beyond that which is secondary to photoreceptor cell death as we have demonstrated in the IND-cKO animal for the revised version of this manuscript and discussed in a response above. Therefore, the IND-cKO model provides a unique tool to assess the impact of rescue studies on photoreceptor function as the functional changes occur prior to significant degeneration. Also, unlike the GLS cKO mouse (i.e. iCre-Gls flox/flox) where photoreceptor degeneration starts very early, impairing our ability to capture reliable and robust ERG measurements, the IND-cKO mice are older at the time of functional changes allowing for robust ERG measurements. While the rate of photoreceptor degeneration in both mouse models is similar and the levels of key amino acids are altered similarly in both models, the mechanisms of cell death in developing/maturing photoreceptors may be different than that in mature photoreceptors. Hence, before we can assess if similar rescue experiments impact photoreceptor function via ERG in the IND-cKO mouse, we need to thoroughly examine how these photoreceptors are dying. These experiments and results will be published in a separate manuscript in the future.

      Reviewer #2 (Public Review):

      Summary:

      Photoreceptor neurons are crucial for vision, and discovering pathways necessary for photoreceptor health and survival can open new avenues for therapeutics. Studies have shown that metabolic dysfunction can cause photoreceptor degeneration and vision loss, but the metabolic pathways maintaining photoreceptor health are not well understood. This is a fundamental study that shows that glutamine catabolism is critical for photoreceptor cell health using in vivo model systems.

      Strengths:

      The data are compelling, and the consideration of potential confounding factors (such as glutaminase 2 expression) and additional experiments to examine the synaptic connectivity and inner retina added strength to this work. The authors were also careful not to overstate their claims, but to provide solid conclusions that fit the results and data provided in their study. The findings linking asparagine supplementation and the inhibition of the integrated stress response to glutamine catabolism within the rod photoreceptor cell are intriguing and innovative. Overall, the authors provide convincing data to highlight that photoreceptors utilize various fuel sources to meet their metabolic needs, and that glutamine is critical to these cells for their biomass, redox balance, function, and survival.

      We greatly appreciate the reviewer’s thoughtful comments and time spent reviewing this manuscript.

      Weaknesses:

      Recent studies have explored the metabolic "crosstalk" that exists within the mammalian retina, where metabolites are transferred between the various retinal cells and the retinal pigment epithelium. It would be of interest to test whether the conditional knockout mice have changes in metabolism (via qPCR such as shown in Figure 4 - Supplemental Figure 1) within the retinal pigment epithelium that may be contributing to the authors' findings in the neural retina. Additionally, the authors have very compelling data to show that inhibition of eIF2a or supplementation with asparagine can delay photoreceptor death via OCT measurements in their conditional knockout mouse model (Figure 6G, H). However, does inhibition of eIF2a or asparagine adversely impact the WT retina? It would also be impactful to know whether this has a prolonged effect, or if it is short-term, as this would provide strength to potential therapeutic targeting of these pathways to maintain photoreceptor health.

      We agree with the reviewer that metabolic communication in the outer retina is crucial to the function and survival of both photoreceptors and RPE. Therefore, we have performed qRT-PCR on eyecups from cKO and WT mice at P14, prior to photoreceptor degeneration. These data, now included in Figure 4 – figure supplement 2, show no significant changes in genes related to glycolysis, pyruvate metabolism and the TCA cycle in eyecups from cKO mice compared to WT mice at P14. The only exception is a significant decrease in Pdk4 in cKO mouse eyecups compared to WT, which was not observed in retina samples.

      Additionally, we have added data demonstrating that systemic treatment with ISRIB does not adversely impact the anatomy of the wild-type retina. Specifically, we performed OCT after 21 days of ISRIB treatment via intraperitoneal delivery in WT mice and show that total retinal, ONL and inner segment/outer segment thickness is unchanged compared to vehicle. These data are now included in Figure 6 – figure supplement 2A. We have also included data to suggest that the effect of ISRIB extends beyond P21 in the cKO mouse. This data, presented in Figure 6 – figure supplement 2B, shows that at P28, ISRIB continues to statistically significantly increase ONL thickness compared to vehicle in cKO animals.

      Reviewer #3 (Public Review):

      Summary:

      The authors explored the role of GLS, a glutaminase, which is an enzyme that catalyzes the conversion of glutamine to glutamate, in rod photoreceptor function and survival. The loss of GLS was found to cause rapid autonomous death of rod photoreceptors.

      Strengths:

      Interesting and novel phenotype. Two types of cre-lines were rigorously used to knockout the Gls gene in rods. Both of the conditional knockouts led to a similar phenotype, i.e. rod death. Histology and ERG were carefully done to characterize the loss of rods over specific ages. A necessary metabolomic study was performed and appreciated. Some rescue experiments were performed and revealed possible mechanisms.

      We thank the reviewer for their comments and appreciation of the methods utilized herein to address the role of GLS-driven Gln catabolism in rod photoreceptors.

      Weaknesses:

      No major weaknesses were identified. The mechanism of GLS-loss-induced rod death seems not fully elucidated by this study but could be followed up in the future, and the same for GLS's role in cones.

      We agree with the reviewer that the downstream metabolic and molecular mechanisms by which Gln catabolism impacts rod photoreceptor health are not fully elucidated. Defining these mechanisms will advance our understanding of photoreceptor metabolism and identify therapeutic targets promoting photoreceptor resistance to stress. Future studies are underway to uncover these mechanisms. Additionally, while outside the scope of the current manuscript, we have generated mice lacking GLS in cone photoreceptors specifically and are currently elucidating the role of GLS in cone photoreceptor metabolism, function, and survival. These results will be published in a separate manuscript.

      Reviewer #1 (Recommendations For The Authors):

      (1) The results could start at line 135, but the first paragraph isn't necessary. The data is published and could be referred to in the introduction.

      We appreciate the reviewer’s suggestion to shorten the beginning of the Results section; however, we believe the supplementary data, which is described in these lines, confirms the scRNAseq gene expression data, while adding GLS expression and localization data within the retina. The scRNAseq data and its publication was noted in the introduction, so we removed the sentence in line 117-119 that restates these results to shorten this section. We also reduced redundancy by removing an introductory sentence to the second Results paragraph.

      (2) "However, like other metabolically-demanding cells, recent work has demonstrated that PRs have the flexibility to utilize fuel sources beyond glucose to meet their metabolic needs (Adler et al., 2014; Du, Cleghorn, Contreras, Linton, et al., 2013; Grenell et al., 2019; Joyal et al., 2016; Xu et al., 2020)." The paper by Daniele et al. demonstrated that glucose is essential for maintaining the viability of rod photoreceptor cells.

      We thank the reviewer for highlighting published literature, which we apologetically overlooked. The reference for Daniele et al. has now been included.

      (3) "Single-cell RNA sequencing data has demonstrated that Gls is expressed throughout the human and mouse retina and much greater than Gls2 (Voigt et al., 2020). The authors should indicate the specific databases searched in Spectacle.

      We appreciate the reviewer’s attention to detail and have now included the references in the Introduction for GSE63473 from Macosko et al. and GSE142449 from Voigt et al., which were the databases we used in Spectacle to assess Gls levels in the mouse and human retina, respectively.

      References:

      (1) Macosko EZ, Basu A, Satija R, Nemesh J, Shekhar K, Goldman M, Tirosh I, Bialas AR, Kamitaki N, Martersteck EM, Trombetta JJ, Weitz DA, Sanes JR, Shalek AK, Regev A, McCarroll SA. Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets. Cell. 2015 May 21;161(5):1202-1214. doi: 10.1016/j.cell.2015.05.002. PMID: 26000488; PMCID: PMC4481139.

      (2) Voigt AP, Binkley E, Flamme-Wiese MJ, Zeng S, DeLuca AP, Scheetz TE, Tucker BA, Mullins RF, Stone EM. Single-Cell RNA Sequencing in Human Retinal Degeneration Reveals Distinct Glial Cell Populations. Cells. 2020 Feb 13;9(2):438. doi: 10.3390/cells9020438. PMID: 32069977; PMCID: PMC7072666.

      (4) The immunolabeling in Figure 2 looks like the images are overexposed, and the Gls antibody is labeling the outer segment, not just the inner segment of photoreceptors.

      We thank the reviewer for their comments regarding our immunofluorescence data. There was background staining of the outer segment in both the WT and cKO retina with decreased GLS staining in the inner segment of the cKO rod photoreceptors at P14 demonstrating loss of GLS in rod photoreceptors similar to Figure 1B.  For Figure 2E, we have provided adjusted images with PNA staining only that better represent the secondary cone degeneration that occurs in the rod photoreceptor-specific Gls cKO, which is the take home point of Figure 2E.

      (5) The authors could use a glutamate antibody to compare it to Gls KO mice as done in Davanger, S., Ottersen, O.P. and Storm-Mathisen, J. (1991), Glutamate, GABA, and glycine in the human retina: An immunocytochemical investigation. J. Comp. Neurol., 311: 483-494. https://doi.org/10.1002/cne.903110404

      We appreciate the reviewer’s suggestion to assess glutamate levels in the wild-type and Gls KO retina via antibody labeling. Our targeted metabolomics studies in Figure 6A provide quantitative evidence that glutamate, the product of the GLS-catalyzed reaction, is decreased as one would expect in that Gls KO retina. The antibody would add to these data by providing the localization of glutamate in the retina. With a rod photoreceptor-specific genetic KO, we would expect glutamate levels to be decreased in these cells. The antibody may also show that glutamate is not only decreased in the rod photoreceptor inner segment, where GLS predominates, but also in the synaptic terminal in accordance with the reviewer’s concerns regarding the impact of GLS KO on synaptic transmission. We have addressed this concern at length above, adding TEM images of the ribbon synapses in the GLS KO retina, and ERG analyses from the IND-cKO animals prior to significant degeneration. In the end, we agree with the reviewer that reduced Glu levels in the GLS cKO retina may impact synaptic transmission to a degree, but the synapses remain intact based on immunofluorescence and TEM analyses and a negative ERG pattern is not observed in the GLS cKO (i.e. iCre-Gls flox/flox) or IND-cKO mouse. As noted above, the structure of the retina in models that disrupt photoreceptor synaptic transmission is maintained (Dick et al. Neuron. 2003) or noted to have modest changes within the first two months of age with the outer nuclear layer not significantly altered until 8-10 months of age (Haeseleer et al. Nature Neuroscience. 2004). So, the impact of the reduced Glu levels on synaptic transmission in the GLS KO retina are unlikely to account in full for the rapid and profound photoreceptor degeneration observed. That said, the IND-cKO mouse, which allows us to assess photoreceptor function prior to significant degeneration unlike the GLS cKO mouse (i.e. iCre-Gls flox/flox), demonstrates GLS-driven Gln catabolism plays a significant role in photoreceptor function but still does not demonstrate a negative ERG pattern. Therefore, assessing Glu localization in this mouse model 10 days post TAM will be informative as to how GLS-driven Gln catabolism impacts photoreceptor function prior to degeneration. The IND-cKO mouse model is currently being extensively characterized for future publication.

      Reviewer #2 (Recommendations For The Authors):

      Main Concerns:

      (1) The authors checked for Gls2 compensation at P14 in the mouse retina. However, this data would be more compelling with an additional timepoint, particularly at P21 which is used in many of their figures throughout the study.

      We thank the reviewer for their suggestion. Figure 1-figure supplement 1D demonstrates no change in Gls2 gene expression at P14 between the WT and cKO retina. With regards to the reviewer’s concern, in Figure 1-figure supplement 1E of the original submission, we demonstrate that the expression of GLS2 is not increased in the cKO retina at P21 via immunofluorescence.

      (2) Recent studies have explored the metabolic "crosstalk" that exists within the mammalian retina, where metabolites are transferred between the various retinal cells and the retinal pigment epithelium. It would be compelling to see whether the cKO mice have changes in metabolism (via qPCR such as shown in Supplementary Figure 1 for Figure 4) within the RPE that may be contributing to their findings in the neural retina. Additionally, mention of this crosstalk and how it may impact their results should be added to the discussion.

      We appreciate the reviewer’s concern for metabolism changes in the RPE of Gls cKO mice. In agreement with reviewer 2, we performed qRT-PCR on eyecups from cKO and WT mice at P14, prior to photoreceptor degeneration. These data, now included in Figure 4 – figure supplement 2, show no significant changes in genes related to glycolysis, pyruvate metabolism and the TCA cycle in eyecups from cKO mice compared to WT mice at P14. The only exception is a significant decrease in Pdk4 in cKO mouse eyecups compared to WT, which was not observed in retina samples.

      (3) The authors use a tamoxifen-inducible cKO model to support their findings in developed rods. However, in Figure 3A it appears that this model has a greater reduction in GLS compared to the Rho-cre mouse model. Can the authors discuss this? Is this cre more efficient at targeting rods or is it leaky and may have affected other retinal cells?

      We thank the reviewer for pointing out this interesting result associated with using the Pde6g-Cre-ERT2 mouse line. Pde6g is expressed by rods to a significant degree but also by cones (GSE63473, scRNAseq data). Therefore, the IND-cKO mouse likely knocks out GLS from both rods and cones upon the TAM induction. To this end, the immunofluorescence image in Figure 3B shows GLS is knocked out in both rod or cone inner segments unlike in Figure 1B where GLS remains in cones when using the rod photoreceptor-specific, Gls<sup>fl/fl</sup> Rho-Cre<sup>+</sup> mouse. As such, as the astute reviewer noted, the fact that Western blot demonstrates greater reduction in GLS protein content fits with the protein being knocked out of both rods and cones. We have added this note about the mouse model in the corresponding text.

      (4) The authors have very compelling data to show that inhibition of eIF2a can delay photoreceptor death via OCT measurements in their cKO mouse model (Figure 6G). However, does ISRIB adversely impact the WT retina? WT vehicle and ISRIB should be shown. It would also be compelling to know whether this has a prolonged effect, or if it is short-term (i.e. would the effect still be present at P42)?

      We appreciate the reviewer’s comments regarding antagonizing the effects of p-eIF2a to prolong photoreceptor survival in the Gls cKO retina. As described above, we have data demonstrating systemic treatment with ISRIB does not adversely impact the anatomy of the wild-type retina (Figure 6-figure supplement 2A). Specifically, we treated WT animals with daily intraperitoneal ISRIB starting at P5 and performed OCT at P21 to show that total retinal, ONL and the inner segment/outer segment thickness is unchanged compared to vehicle-treated WT animals. Additionally, we have included data demonstrating the photoreceptor neuroprotective effect of ISRIB treatment in the Gls cKO mouse extends beyond P21 in the cKO mouse (Figure 6-figure supplement 2B).

      (5) For Figure 6H, same as point #4.

      While we have not specifically assessed potential retinal toxicity secondary to systemic Asn supplementation, oral Asn supplementation (up to 100mg/kg/day) was provided to patients for 24 months and found to be well-tolerated (PMID:31123592). Allometric scaling of this dose to the mouse would yield a mouse dose of 1234 mg/kg/day, which is much greater than the 200mg/kg/day dose provided here (PMID: 27057123). Additionally, a 90-day toxicity study of Asn in rats demonstrated a no observed adverse effect level of 1.62g/kg bodyweight/day in males and 1.73g/kg bodyweight/day in females (PMID: 18508175). The lower dose in that study equates to a mouse dose of 3.2g/kg bodyweight/day, well above the mouse dose utilized in this report. As such, future studies should focus on a dose-response relationship with Asn supplementation, and as the reviewer suggested, determining the duration of effect with Asn supplementation.

      (6) Some of the results section belongs in the introduction or discussion and can be moved.

      We have addressed the reviewer’s concern by moving some of the results to the discussion and removing statements in the results that were either noted in the Introduction or conferred in the Discussion.

      Minor Concerns:

      (1) Scale bar mentions in the figure legends use plural when only one is present, or in some cases are missing. A scale bar should be added to the OCT images if possible.

      We appreciate the reviewer’s attention to detail, and information regarding scale bars has been updated in the figure legends.

      (2) For Figures 1I and J, the sample size changes when J is a quantification of I. Please correct.

      We have corrected the sample size to be consistent between Figures 1I and J.

      (3) In Figure 1 - Figure Supplement 3 the P42 timepoint is not mentioned in the legend. Please correct.

      We have now included the P42 timepoint in the legend for in Figure 1 – Figure Supplement 3 as well as the manuscript text.

      (4) In Figure 1 - Figure Supplement 5 the wrong P value is mentioned in the legend. Please correct.

      We have corrected the P value in the legend for Figure 1 – Figure Supplement 5.

      (5) Can the authors double-check their ERG light intensity settings? They seem high. Please confirm if they are correct.

      We appreciate the reviewer’s concern for ERG light intensity settings and have confirmed the settings used in the study were 32 cd*s/m<sup>2</sup> and 100 cd*s/m<sup>2</sup> for scotopic and photopic ERG recordings, respectively.

      (6) The legend key in Figure 2A would be more helpful if the axis were present by the representative traces.

      We thank the reviewer for the suggestion of adding axes to the ERG traces. Figure 2A has been updated to reflect this modification.

      (7) Can the authors check that the error bars are present in Figure 5E?

      We appreciate the reviewer’s concern for error bars in Figure 5E, which are included in the figure. The standard error in this experiment is so small that the symbols overlap with the error bars.

      Reviewer #3 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data, or analyses.

      (1) Figure 6: ISRIB seems to give the most dramatic rescue of cKO GLS in P21 rods. Does it completely prevent rod death? i.e. What's the ONL thickness of P21 WT control? What's the ISRIB rescue of an older cKO animal, say P35?

      The ONL thickness of P21 WT control is on average 0.06 mm (Figure 1E), while the ONL thickness of the Gls cKO retina with ISRIB treatment at P21 is on average 0.044 mm. Therefore, rod death is not completely prevented with ISRIB but rather, rod photoreceptor survival is prolonged. As noted above, we have provided data to demonstrate that the photoreceptor neuroprotective effect of ISRIB lasts beyond P21 (Figure 6-figure supplement 2B).

      (2) What's the mechanistic link between ISR and GLS beyond current speculation? Does GLS have other unknown functions beyond converting glutamine to glutamate? Any novel insights from GLS protein structure?

      We thank the reviewer for this thoughtful question. It is certainly possible that GLS has other functions outside of its role in glutaminolysis. It is well known that other metabolic enzymes have moonlighting functions including hexokinase 2, which has been shown to be important in preventing intrinsic apoptosis through blocking the binding of pro-apoptotic proteins to the mitochondria. While not directly related to ISR, a single report suggests GLS functions non-canonically in Gln-deprived states, promoting mitochondrial fusion to suppress ROS production (PMID: 29934617). Investigating the moonlighting functions of metabolic enzymes is part of our ongoing research program and GLS is included in these studies.

      (3) Just curious about GLS cKO in cones. Any similar phenotype?

      We appreciate the reviewer’s curiosity regarding Gls cKO in cones and this study is currently ongoing with a poster presented at ARVO 2024 (Subramanya et al; Glutaminase-driven glutamine catabolism supports cone photoreceptor metabolism, function, and structure. Invest. Ophthalmol. Vis. Sci. 2024;65(7):193) and a manuscript in preparation. As discussed above, GLS knock out in cones likely impacts their function, in accordance with the data presented at ARVO 2024.

      Recommendations for improving the writing and presentation.

      (1) In the Discussion, lines 458-466, it's incorrect to compare the importance of glucose metabolism to GLS-dependent pathway to photoreceptors in this way. An alternative explanation: glucose metabolism is so important that the system has many redundancies, e.g. HK1 exists in addition to HK2, thus single gene KO leads to no phenotype. The only fair comparison is nutrient deprivation, e.g. taking out glucose or glutamine from retina explants (Punzo et al., 2009).

      The reviewer makes an excellent point. While we do not see an upregulation of GLS2 in the retina or rod PRs upon GLS knockout (Figure 1-figure supplement 1 D and E), loss of Gls in rod PRs does alter the expression of many metabolism-related genes (Figure 4-figure supplement 1).  We alluded to these data and the reviewer’s point in the second paragraph of the discussion: “In any of these transgenic mouse models, PRs may use other transporters to take up fatty acids or glucose or rewire their metabolism to maintain metabolic homeostasis and stave off degeneration (Subramanya et al., 2023; Wubben et al., 2017). Our data show that any metabolic reprogramming that is occurring in the cKO mouse retina appears unable to significantly circumvent the significant and rapid PR degeneration suggesting the importance of Gln catabolism in rod PRs. Furthermore, inducing GLS knockdown in mature PRs also demonstrated rapid PR degeneration (Figure 3).”

      In the revised article, we have amended these sentences to include the importance of metabolic redundancies. “In any of these transgenic mouse models, PRs may use other transporters to take up fatty acids or glucose, rewire their metabolism, or utilize metabolic redundancies to maintain metabolic homeostasis and stave off degeneration (Subramanya et al., 2023; Wubben et al., 2017). Our data show that any metabolic reprogramming that is occurring in the cKO mouse retina appears unable to significantly circumvent the significant and rapid PR degeneration suggesting the importance of Gln catabolism in rod PRs. Furthermore, inducing GLS knockdown in mature PRs also demonstrated rapid PR degeneration (Figure 3).”

      (2) Please discuss the mosaic activity of Rho-cre used in this study, as described in the original study (Le et al 2006). Line 221 (Li et al 2005) seems to be a different Rho-Cre created by a different group. Please make sure the citation is correct and consistent.

      We apologize for the confusion and have corrected the reference on line 221 to Le et al, 2006. The reviewer is correct that the original report (Le at al. 2006) demonstrated a mosaic of Cre-mediated recombination in rod photoreceptors and rod bipolar cells in the mouse line that had the shorter (0.2 kb) mouse opsin promoter-controlled Cre. In contrast, this same report showed only Cre-mediated recombination in rod photoreceptors in another line that utilized a long (4.1 kb) mouse opsin promoter-controlled Cre. We have published using this latter promoter-controlled Cre recombinase in at least 5 different mouse models (Wubben et al. 2017; Weh et al. 2020; Weh et al. 2023; Subramanya et al. 2023; the current report), and in all these models, we observe clear and consistent knockout by immunofluorescence only in rod photoreceptors with residual protein in cones and no significant change in protein expression in the INL where bipolar cells reside. Western blots confirm the reduction in protein expression.

      (3) The authors should provide representative images of retina cross-sections for key rescue data (Figure 6G&H).

      As requested by Reviewer 3, representative histology images of retina cross-sections for the ISRIB and Asn rescue experiments in Gls cKO mice at P21 are now included in the manuscript in Figure 6 – figure supplement 3.

      Minor corrections to the text and figures.

      (1) Spell out Gln in the Abstract when used for the first time.

      We have included glutamine (Gln) in the abstract upon first use.

      (2) Line 433, Figure 6G should be 6H.

      Thank you for the correction, the manuscript has been updated.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      The study aimed to investigate the significant impact of criterion placement on the validity of neural measures of consciousness, examining how different standards for classifying a stimulus as 'seen' or 'unseen' can influence the interpretation of neural data. They conducted simulations and EEG experiments to demonstrate that the Perceptual Awareness Scale, a widely used tool in consciousness research, may not effectively mitigate criterion-related confounds, suggesting that even with the PAS, neural measures can be compromised by how criteria are set. Their study challenged existing paradigms by showing that the construct validity of neural measures of conscious and unconscious processing is threatened by criterion placement, and they provided practical recommendations for improving experimental designs in the field. The authors' work contributes to a deeper understanding of the nature of conscious and unconscious processing and addresses methodological concerns by exploring the pervasive influence of criterion placement on neural measures of consciousness and discussing alternative paradigms that might offer solutions to the criterion problem.

      The study effectively demonstrates that the placement of criteria for determining whether a stimulus is 'seen' or 'unseen' significantly impacts the validity of neural measures of consciousness. The authors found that conservative criteria tend to inflate effect sizes, while liberal criteria reduce them, leading to potentially misleading conclusions about conscious and unconscious processing. The authors employed robust simulations and EEG experiments to demonstrate the effects of criterion placement, ensuring that the findings are well-supported by empirical evidence. The results from both experiments confirm the predicted confounding effects of criterion placement on neural measures of unconscious and conscious processing.

      The results are consistent with their hypotheses and contribute meaningfully to the field of consciousness research.

      We would like to thank reviewer 1 for their positive words and for taking the time to evaluate our manuscript.

      Reviewer #2 (Public review):

      Summary:

      The study investigates the potential influence of the response criterion on neural decoding accuracy in consciousness and unconsciousness, utilizing either simulated data or reanalyzing experimental data with post-hoc sorting data.

      Strengths:

      When comparing the neural decoding performance of Target versus NonTarget with or without post-hoc sorting based on subject reports, it is evident that response criterion can influence the results. This was observed in simulated data as well as in two experiments that manipulated subject response criterion to be either more liberal or more conservative. One experiment involved a two-level response (seen vs unseen), while the other included a more detailed four-level response (ranging from 0 for no experience to 3 for a clear experience). The findings consistently indicated that adopting a more conservative response criterion could enhance neural decoding performance, whether in conscious or unconscious states, depending on the sensitivity or overall response threshold.

      Weaknesses:

      (1) In the realm of research methodology, conducting post-hoc sorting based on subject reports raises an issue. This operation leads to an imbalance in the number of trials between the two conditions (Target and NonTarget) during the decoding process. Such trial number disparity introduces bias during decoding, likely contributing to fluctuations in neural decoding performance. This potential confounding factor significantly impacts the interpretation of research findings. The trial number imbalance may cause models to exhibit a bias towards the category with more trials during the learning process, leading to misjudgments of neural signal differences between the two conditions and failing to accurately reflect the distinctions in brain neural activity between target and non-target states. Therefore, it is recommended that the authors extensively discuss this confounding factor in their paper. They should analyze in detail how this factor could influence the interpretation of results, such as potentially exaggerating or diminishing certain effects, and whether measures are necessary to correct the bias induced by this imbalance to ensure the reliability and validity of the research conclusions.

      We would like to thank reviewer 2 for their positive words and for taking the time to evaluate our manuscript. In response to this asserted weakness, we would like to point out that the issue of trial imbalances was already comprehensively addressed in the manuscript. No trial imbalances are present in the analyzed data for any of the conditions, so that none of our reported results could have been impacted by this. This was done through the following set of measures:

      (1) Training data (method section): “a linear discriminant analytic (LDA) classifier was trained for each participant using all trials from all sessions (3 sessions in Experiment 1, 2 sessions in Experiment 2) to discriminate target from no-target trials based on EEG data, irrespective of seen/unseen responses and irrespective of the response criterion. To maximize signal-to-noise ratio, we applied a leave-one-person-out cross validated decoding scheme by using all classifiers from all participants except the participants that was being tested (separately for Experiment 1 and for Experiment 2). This leave-one-person-outcross validation procedure maximized the available data for training without requiring k-foldingon subsets of cells with low response counts, so that all test sets were classified by the same fully independent classifiers. A single time series of classification performance across time was obtained for every participant (every testing set) by averaging classification performance across all classifiers that tested that set (see Methods and supplementary Figure S2 for details).”<br /> This leave-one-person-outcross validation scheme made surre that no trial selection needed to be performed to analyze conservative or liberal conditions. Both conditions were classified using the same classifier, consisting of all data from the other participants.

      (2) Testing data (methods section): “To ensure that differences resulting from post hoc sorting could not be explained by differences in signal-to-noise ratio resulting from disparities in trial counts in the testing set, we equated trial counts between the liberal and conservative condition within each participant by randomly selecting the same number of trials from overrepresented cells (for Experiment 1, this was done at the level of ‘seen’ and ‘unseen’ responses, for experiment 2 the trial counts were equated at eachof the PAS levels, see methods for details). As a result, response-contingent conditions in the liberal and conservative conditions had identical input for all classification analyses. Although different trial counts in the testing set might affect the precision with which AUC is estimated in a decoding analysis, it does not affect the size of AUC itself. Trial count equation was merely performed tomake sure the liberal and conservative condition were as comparable as possible.”

      Indeed, we also report at the end of this section that running the same analyses without selecting trials in the test set yielded qualitatively identical results: “Analyzing the data without equating trial counts resulted in qualitatively identical results.”

      To remove any lack of clarity about this, we now also briefly report in the beginning of the discussion section that the results cannot be explained by unequal trial counts:

      “We found that in both experiments, criterion shifts modulated effect size in neural measures of ‘unconscious’ (unseen) and/or ‘conscious’ (seen) processing, and that this happens even though the conservative and liberal condition used the same independent training data (identical classifiers), and even though the trial counts in the test sets were equated for the conservative and liberal condition.”

      Reviewer #3 (Public review):

      Summary:

      Fahrenfort et al. investigate how liberal or conservative criterion placement in a detection task affects the construct validity of neural measures of unconscious cognition and conscious processing. Participants identified instances of "seen" or "unseen" in a detection task, a method known as post hoc sorting. Simulation data convincingly demonstrate that, counterintuitively, a conservative criterion inflates effect sizes of neural measures compared to a liberal criterion. While the impact of criterion shifts on effect size is suggested by signal detection theory, this study is the first to address this explicitly within the consciousness literature. Decoding analysis of data from two EEG experiments further shows that different criteria lead to differential effects on classifier performance in post hoc sorting. The findings underscore the pervasive influence of experimental design and participant reports on neural measures of consciousness, revealing that criterion placement poses a critical challenge for researchers.

      Strengths and Weaknesses

      One of the strengths of this study is the inclusion of the Perceptual Awareness Scale (PAS), which allows participants to provide more nuanced responses regarding their perceptual experiences. This approach ensures that responses at the lowest awareness level (selection 0) are made only when trials are genuinely unseen. This methodological choice is important as it helps prevent the overestimation of unconscious processing, enhancing the validity of the findings.

      The authors also do a commendable job in the discussion by addressing alternative paradigms, such as wagering paradigms, as a possible remedy to the criterion problem (Peters & Lau, 2015; Dienes & Seth, 2010). Their consideration of these alternatives provides a balanced view and strengthens the overall discussion.

      Our initial review identified a lack of measures of variance as one potential weakness of this work. However we agree with the authors' response that plotting individual datapoints for each condition is indeed a good visualization of variance within a dataset.

      Impact of the Work:

      This study effectively demonstrates a phenomenon that, while understood within the context of signal detection theory, has been largely unexplored within the consciousness literature. Subjective measures may not reliably capture the construct they aim to measure due to criterion confounds. Future research on neural measures of consciousness should account for this issue, and no-report measures may be necessary until the criterion problem is resolved.

      We thank reviewer 3 for their positive words and for taking the time to evaluate our manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      (1) The rationale for performing genomics, transcriptional, and proteomics work in 293T cells is not discussed. Further, there are no functional readouts mentioned in the 293T cells with expression of the fusion-oncogenes. Did these cells have any phenotypes associated with fusion-oncogene expression (proliferation differences, morphological changes, colony formation capacity)? Further, how similar are the gene expression signatures from RNA-seq to rhabdomyosarcoma? This would help the reader interpret how similar these cell models are to human disease.

      We appreciate the reviewer’s comments and understand the limitation of HEK293T cell culture. HEK293T cells were used as a surrogate system that enabled us to systemically examine and compare the transcriptional activation mechanisms between VGLL2-NCOA2/TEAD1-NCOA2 and YAP/TAZ. HEK293T cells have previously been used as a model system to study the signaling and transcriptional mechanisms of the Hippo/YAP pathway (1,2). Our data also showed that the ectopic expression of VGLL2-NCOA2 and TEAD1-NCOA2 in HEK293 cells can promote proliferation (Figure 1-figure supplement 1B), consistent with their potential oncogenic function.

      (2) TEAD1::NCOA2 fusion-oncogene model was not credentialed past H&E, and expression of Desmin. Is the transcriptional signature in C2C12 or 293T similar to a rhabdomyosarcoma gene signature?

      We understand the reviewer’s concern. VGLL2-NCOA2 in vivo tumorigenesis model generated by C2C12 cell orthotopic transplantation has recently been reported, and it exhibits similar characteristics with zebrafish transgenic tumors as well as human scRMS samples that carry the VGLL2-NCOA2 fusion (3). Due to the similar transcriptional and oncogenic mechanisms employed by both VGLL2-NCOA2 and TEAD1-NCOA2 fusion proteins, we expect that the TEAD1-NCOA2 dependent C2C12 transplantation model will closely resemble that induced by VGLL2-NCOA2.

      (3) For the fusion-oncogenes, did the HA, FLAG, or V5 tag impact fusion-oncogene activity? Was the tag on the 3' or 5' of the fusion? This was not discussed in the methods.

      To address the reviewer’s concern, we carefully compared the transcriptional activity of the fusion proteins with the HA tag at the 5’ end or FLAG and V5 tag at the 3’ end. We found that neither the tag type nor its location significantly affects the ability of VGLL2-NCOA2 and TEAD1-NCOA2 to induce downstream gene transcription, measured by qPCR. The data is summarized in Figure 1-figure supplement 1 G-H.

      (4) Generally, the lack of details in the figures, figure legends, and methods make the data difficult to interpret. A few examples are below:

      a. Individual data points are not shown for figure bar plots (how many technical or biological replicates are present and how many times was the experiment repeated?).

      As requested, we have added the individual data points to the bar plots. The Method section now includes information on the number of biological replicates and the times the experiments were repeated.

      b. What exons were included in the fusion-oncogenes from VGLL2 and NCOA2 or TEAD1 and NCOA2?

      We have now included the exon structure organization of VGLL2-NCOA2 or TEAD1-NCOA2 fusions in Figure 1-figure supplement 1A.

      c. For how long were the colony formation experiments performed? Two weeks?

      We have included more detailed information about the colony formation assay in the Methods section.

      d. In Figure 2D, what concentration of CP1 was used and for how long?

      The CP1 concentration and treatment duration information has now been included in the figure legend and Methods section.

      e. How was A485 resuspended for cell culture and mouse experiments, what is the percentage of DMSO?

      The Methods section now includes detailed information on how A485 is prepared for in vitro and in vivo experiments.

      f. How many replicates were done for RNA-seq, CUT&RUN, and ATACseq experiments?

      RNA-seq was done with three biological replicates and CUT&RUN and ATAC-seq were performed with two biological replicates. This information is now included in the Methods section for clarification.

      Reviewer #2 (Public Review):

      In the manuscript entitled "VGLL2 and TEAD1 fusion proteins drive YAP/TAZ-independent transcription and tumorigenesis by engaging p300", Gu et al. studied two Hippo pathway-related gene fusion events (i.e., VGLL2-NCOA2, TEAD1-NCOA2) in spindle cell rhabdomyosarcoma (scRMS) and showed that their fusion proteins can activate Hippo downstream gene transcription independent of YAP/TAZ. Using the BioID-based mass spectrometry analysis, the authors revealed histone acetyltransferase CBP/p300 as specific binding proteins for VGLL2-NCOA2 and TEAD1-NCOA2 fusion proteins. Pharmacologically targeting p300 inhibited the fusion proteins-induced Hippo downstream gene transcription and tumorigenic events.

      Overall, this study provides mechanistic insights into the scRMS-associated gene fusions in tumorigenesis and reveals potential therapeutic targets for cancer treatment. The manuscript is well-written and easy to follow.

      Here, several suggestions are made for the authors to improve their study.

      Main points

      (1) The authors majorly focused on the Hippo downstream gene transcription in this study, while a significant portion of genes regulated by the VGLL2-NCOA2 and TEAD1-NCOA2 fusion proteins are non-Hippo downstream genes (Figure 3). The authors should investigate whether the altered Hippo pathway transcription is essential for VGLL2-NCOA2 and TEAD1-NCOA2-induced cell transformation and tumorigenesis. Specifically, they should test if treatment with the TEAD inhibitor can reverse the cell transformation and tumorigenesis caused by VGLL2-NCOA2 but not TEAD1-NCOA2. In addition, it is important to examine whether YAP-5SA expression can rescue the inhibitory effects of A485 on VGLL2-NCOA2 and TEAD1-NCOA2-induced colony formation and tumor growth. This will help clarify whether Hippo downstream gene transcription is important for the oncogenic activities of these two fusion proteins.

      We thank the reviewer for the comments. Although we have not tested the small molecular TEAD inhibitor on VGLL2-NCOA2 or TEAD1-NCOA2-induced cell transformation and tumorigenesis, we expect that TEAD inhibition will block VGLL2-NCOA2- but not TEAD1-NCOA2-induced oncogenic activity. It is because TEAD1-NCOA2 does not contain the auto-palmitoylation sites and the hydrophobic pocket in the C-terminal YAP-binding domain of TEAD1 that the TEAD small molecule inhibitor occupies (4). We also appreciate the reviewer’s suggestion of YAP5SA rescue experiments. However, due to its strong oncogenic activity, YAP5SA itself can induce robust downstream transcription and cell transformation with or without A485 treatment, as shown in Figure 5. Thus, it will be unlikely to address whether non-Hippo downstream genes induced by the fusions are important for cell transformation and tumorigenesis. Because of the distinct nature of transcriptional and chromatin landscapes controlled by VGLL2-NCOA2/TEAD-NCOA2 and YAP, we speculate that both Hippo and non-Hippo-related downstream genes contribute to the oncogenic activation and tumor phenotypes induced by the fusion proteins.

      (2) Rationale for selecting CBP/p300 for functional studies needs to be provided. The BioID-MS experiment identified many interacting proteins for VGLL2-NCOA2 and TEAD1-NCOA2 fusion proteins (Table S4). The authors should explain the scoring system used to identify the high-interacting proteins for VGLL2-NCOA2 and TEAD1-NCOA2 fusion proteins. Was CEP/p300 the top candidates on the list? Providing this information will help justify the focus on CBP/p300 and validate their importance in this study.

      We appreciate the reviewer’s point. CBP/P300 is among the top hits in our proteomics screens of both VGLL2-NCOA2 and TEAD1-NCOA2. Our focus on CBP/P300 is mainly due to the well-established interactions between CBP/P300 and the NCOA family transcriptional co-activators, in which the CBP/P300-NCOA complex plays a central role in mediating nuclear receptors-induced transcriptional activation (5). In addition, our data is consistent with another re-current Vgll2 fusion identified in scRMS, VGLL2-CITED2 (6) that has a C-term fusion partner from CITED2, which is a known CBP/P300 interacting protein (7).

      (3) p300 was revealed as a key driver for the VGLL2-NCOA2 and TEAD1-NCOA2 fusion proteins-induced transcriptome alteration and tumorigenesis. To strengthen the point, the authors should identify the p300 binding region on VGLL2-NCOA2 and TEAD1-NCOA2 fusion proteins. Mutants with defects in p300 binding/recruitment should be generated and included as a control in the related q-PCR and tumorigenic studies. This work will help confirm the crucial role of p300 in mediating the oncogenic effects of these two fusion proteins.

      We thank the reviewer for the suggestion. We have performed the co-immunoprecipitation assay using the deletion mutant form of VGLL2-NCOA2. We have performed additional co-immunoprecipitation experiments and demonstrated that the C-term NCOA2 part of the fusion is responsible for mediating the interaction between the fusion protein and CBP/P300. These results are now included in the new Figure 5A and are consistent with the reported structural analysis of CBP/P300-NCOA complex (8). In addition, our new data showed the inability of the VGLL2-NCOA2 ∆NCOA2 mutant to induce gene transcription (Figure 1-figure supplement 1D). Furthermore, our data using the small molecular CBP/P300 inhibitor clearly demonstrated that CBP/P300 is required to mediate cell transformation and tumorigenesis induced by the two fusion proteins in vitro and in vivo (Figure 5 and 6).

      (4) Another major issue is the overexpression system extensively used in this study. It is important to determine whether the VGLL2-NCOA2 and TEAD1-NCOA2 fusion genes are also amplified in cancer. If not, the expression levels of the VGLL2-NCOA2 and TEAD1-NCOA2 fusion proteins should be adjusted to endogenous levels to assess their oncogenic effects on gene transcription and tumorigenesis. This approach would make the study more relevant to the pathological conditions observed in scRMS cancer patients.

      We appreciate the reviewer’s input and acknowledge the limitation of the HEK293T and C2C12 cell-based models that rely on ectopic expression of VGLL2-NCOA2 and TEAD1-NCOA2 fusion proteins. It is currently unclear whether the VGLL2-NCOA2 and TEAD1-NCOA2 fusion genes are also amplified in sarcoma. As mentioned before, these surrogate cell culture systems allowed us to systemically compare the transcriptional regulation by the fusion proteins and YAP/TAZ and elucidate the molecular mechanism underlying the Hippo/YAP-independent oncogenic transformation induced by VGLL2-NCOA2 and TEAD1-NCOA2.

      References:

      (1) Genes Dev . 2007 Nov 1;21(21):2747-61. doi: 10.1101/gad.1602907. Inactivation of YAP oncoprotein by the Hippo pathway is involved in cell contact inhibition and tissue growth control

      (2) Genes Dev . 2010 Jan 1;24(1):72-85. doi: 10.1101/gad.1843810. A coordinated phosphorylation by Lats and CK1 regulates YAP stability through SCF(beta-TRCP)

      (3) VGLL2-NCOA2 leverages developmental programs for pediatric sarcomagenesis. Watson S, LaVigne CA, Xu L, Surdez D, Cyrta J, Calderon D, Cannon MV, Kent MR, Cell Rep. 2023 Jan 31;42(1):112013.

      (4) Lats1/2 Sustain Intestinal Stem Cells and Wnt Activation through TEAD-Dependent and Independent Transcription. Cell Stem Cell. 2020 May 7;26(5):675-692.e8.

      (5) Yi, P., Yu, X., Wang, Z., and O’Malley, B.W. (2021). Steroid receptor-coregulator transcriptional complexes: new insights from CryoEM. Essays Biochem. 65, 857–866.

      (6) A Molecular Study of Pediatric Spindle and Sclerosing Rhabdomyosarcoma: Identification of Novel and Recurrent VGLL2-related Fusions in Infantile Cases. Am J Surg Pathol . 2016 Feb;40(2):224-35. doi: 10.1097/

      (7) CITED2 and the modulation of the hypoxic response in cancer. Fernandes MT, Calado SM, Mendes-Silva L, Bragança J.World J Clin Oncol. 2020 May 24;11(5):260-274.

      (8) Yu, X., Yi, P., Hamilton, R.A., Shen, H., Chen, M., Foulds, C.E., Mancini, M.A., Ludtke, S.J., Wang, Z., and O’Malley, B.W. (2020). Structural insights of transcriptionally active, full-length Androgen receptor coactivator complexes. Mol. Cell 79, 812–823.e4.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Giménez-Orenga et al. investigate the origin and pathophysiology of myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) and fibromyalgia (FM). Using RNA microarrays, the authors compare the expression profiles and evaluate the biomarker potential of human endogenous retroviruses (HERV) in these two conditions. Altogether, the authors show that HERV expression is distinct between ME/CFS and FM patients, and HERV dysregulation is associated with higher symptom intensity in ME/CFS. HERV expression in ME/CFS patients is associated with impaired immune function and higher estimated levels of plasma cells and resting CD4 memory T cells. This work provides interesting insights into the pathophysiology of ME/CFS and FM, creating opportunities for several follow-up studies.

      Strengths:

      (1) Overall, the data is convincing and supports the authors' claims. The manuscript is clear and easy to understand, and the methods are generally well-detailed. It was quite enjoyable to read.

      (2) The authors combined several unbiased approaches to analyse HERV expression in ME/CFS and FM. The tools, thresholds, and statistical models used all seem appropriate to answer their biological questions.

      (3) The authors propose an interesting alternative to diagnosing these two conditions. Transcriptomic analysis of blood samples using an RNA microarray could allow a minimally invasive and reproducible way of diagnosing ME/CFS and FM.

      Weaknesses:

      (1) The cohort analysed in this study was phenotyped by a single clinician. As ME/CFS and FM are diagnosed based on unspecific symptoms and are frequently misdiagnosed, this raises the question of whether the results can be generalised to external cohorts.

      Thank you for your comment. Surely the study of larger cohorts will determine the external validity of these results in a clinical scenario. However, this pilot study, first of its kind, was designed to maximize homogeneity across participants which seemed primarily ensured by the study of females only and diagnosis by a single experienced observer.

      (2) The analyses performed to unravel the causes and effects of HERV expression in ME/CFS and FM are solely based on sequencing data. Experimental approaches could be used to validate some of the transcriptomic observations.

      Certainly, experimental approaches may add robustness to the implication of HERVs in ME/CFS. We indeed consider taking this avenue to deepen in the findings presented here for future work. However, the limited knowledge of HERV-mediated physiological functions may hamper the obtention of prompt results towards revealing causes and effects of HERV expression in ME/CFS and FM.

      Reviewer #2 (Public review):

      Summary:

      Giménez-Orenga carried out this study to assess whether human endogenous retroviruses (HERVs) could be used to improve the diagnosis of Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) and Fibromyalgia (FM). To this end, they used the HERV-V3 array developed previously, to characterize the genome-wide changes in the expression of HERVs in patients suffering from ME/CFS, FM, or both, compared to controls. In turn, they present a useful repertoire of HERVs that might characterize ME/CFS and FM. For the most part, the paper is written in a manner that allows a natural understanding of the workflow and analyses carried out, making it compelling. The figures and additional tables present solid support for the findings. However, some statements made by the authors seem incomplete and would benefit from a more thorough literature review. Overall, this work will be of interest to the medical community seeking in better understanding of the co-occurrence of these pathologies, hinting at a novel angle by integrating HERVs, which are often overlooked, into their assessment.

      Strengths:

      (1) The work is well-presented, allowing the reader to understand the overall workflow and how the specific aims contribute to filling the knowledge gap in the field.

      (2) The analyses carried out to understand the potential impact on gene expression mediated by HERVs are in line with previous works, making it solid and robust in the context of this study.

      Weaknesses:

      (1) The authors claim to obtain genome-wide HERV expression profiles. However, the array used was developed using hg19, while the genomic analysis of this work are carried out using a liftover to hg38. It would improve the statement and findings to include a comparison of the differences in HERVs available in hg38, and how this could impact the "genome-wide" findings.

      This is an important point. However, the low number of probes (less than 100) that were excluded from our analysis by lack of correspondence with hg38 among the 1,290,800 probesets was interpreted as insignificant for "genome-wide" claims. An aspect that will be explained in the revised version of this manuscript.

      (2) The authors in some points are not thorough with the cited literature. Two examples are:

      a) Lines 396-397 the authors say "the MLT1, usually found enriched near DE genes (Bogdan et al., 2020)". I checked the work by Bogdan, and they studied bacterial infection. A single work in a specific topic is not sufficient to support the statement that MLT1 is "usually" in close vicinity to differentially expressed genes. More works are needed to support this.

      b) After the previous statement, the authors go on to mention "contributing to the coding of conserved lncRNAs (Ramsay et al., 2017)". First, lnc = long non-coding, so this doesn't make sense. Second, in the work by Ramsay they mention "that contributed a significant amount of sequence to primate lncRNAs whose expression was conserved", which is different from what the authors in this study are trying to convey. Again, additional work and a rephrasing might help to support this idea.

      Certainly, these two sentences need rephrasing to better adjust to current evidence.

      Revised sentences can now be found in lines 397-402

      (3) When presenting the clusters, the authors overlook the fact that cluster 4 is clearly control-specific, and fail to discuss what this means. Could this subset of HERV be used as bona fide markers of healthy individuals in the context of these diseases? Are they associated with DE genes? What could be the impact of such associations?

      Using control DE HERV as bona fide markers of healthy individuals seems like an interesting possibility worth exploring. Control DE HERV (cluster 4) associate with DE genes involved in apoptosis, T cell activation and cell-cell adhesion (modules 1 and 6). The impact of which deserves further study.

      Appraisals on aims:

      The authors set specific questions and presented the results to successfully answer them. The evidence is solid, with some weaknesses discussed above that will methodologically strengthen the work.

      Likely impact of work on the field:

      This work will be of interest to the medical community looking for novel ways to improve clinical diagnosis. Although future works with a greater population size, and more robust techniques such as RNA-Seq, are needed, this is the first step in presenting a novel way to distinguish these pathologies.

      It would be of great benefit to the community to provide a table/spreadsheet indicating the specific genomic locations of the HERVs specific to each condition. This will allow proper provenance for future researchers interested in expanding on this knowledge, as these genomic coordinates will be independent of the technique used (as was the array used here).

      We agree with the reviewer that sharing genomic locations of DE HERVs in these pathologies would contribute to the development of these findings. Unfortunately, we do not hold the rights to share probe coordinates from this custom HERV-V3 microarray which we used under MTA agreement with its developer.

      Reviewer #3 (Public review):

      The authors find that HERV expression patterns can be used as new criteria for differential diagnosis of FM and ME/CFS and patient subtyping. The data are based on transcriptome analysis by microarray for HERVs using patient blood samples, followed by differential expression of ERVs and bioinformatic analyses. This is a standard and solid data processing pipeline, and the results are well presented and support the authors' claim.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Recommandations/questions:

      (1) The authors point towards the biomarker potential of HERV expression signatures. In line with this, it would be important to test if they can predict the correct pathology for patients using the expression of DE HERVs. Additionally, as a single clinician annotated the cohort analysed in this study, it would be interesting to validate the signatures identified in this work by reanalysing publicly available transcriptomic data from independent studies.

      Thank you for the suggestion. We plan to conduct this analysis and have added the following statement to the manuscript (lines 482-483): “Given the limited sample size in our cohort, validation of the findings in extended cohorts is a must.”

      (2) The authors suggest that an epigenetic mechanism causes the dysregulated HERV expression in ME/CFS patients. However, in Fig.1A, HERV expression profiles of co-diagnosed patients are more similar to healthy controls than patients with either condition. How could the co-morbidity of FM "rescue" the phenotype of ME/CFS?

      Thank you for the insightful comment. It is notable that co-diagnosed patients exhibit HERV expression profiles more similar to those of healthy controls than to either FM´s or ME/CFS´s. These findings may suggest a distinct underlying pathomechanism for this patient group, supporting the identification of a novel nosologic entity, as discussed in lines 372-374 of the manuscript.

      (3) Abundant evidence in the literature links HERV dysregulation with the production of RNA:DNA hybrids and dsRNAs and viral mimicry. The authors found that ME/CFS subgroup 2, which exhibits the most important HERV dysregulation, is also associated with decreased signatures of pathogen detection. It would be interesting to quantify the abundance of DNA:RNA hybrids and dsRNAs in PBMCs of ME/CFS and FM patients as well as healthy controls. It would be interesting to discuss how downregulation of pathogen detection pathways could be a mechanism in ME/CFS patients to avoid viral mimicry and potential links with inflammation in this disease.

      Certainly, HERVs can influence disease pathophysiology by generating RNA:DNA hybrids and dsRNA. However, microarray data does not allow this analysis. Future actions to investigate the underlying mechanisms of differentially expressed HERVs could investigate this interesting possibility.

      (4) Another intriguing result is how overexpression of Module 3 in ME/CFS subgroup 2 is associated with higher levels of plasma cells. The authors hypothesize that the changes in immune cell abundances reflect previous viral infections, but another possibility would be immune activation against HERVs. Are there protein-coding sequences (gag, pro, pol, env) amongst the HERV sequences of module 3? If so, it would be interesting to validate HERV protein expression in these samples. Additionally, blood samples of ME/CFS patients and healthy controls should be analysed in flow cytometry to describe the abundance and phenotype of immune cells precisely.

      Thank you for your insightful comments. In fact, we identified three HERV elements with protein-coding regions whose functional relevance remains uncertain. They present an interesting avenue for future investigation, particularly regarding immune activation.

      Minor comments:

      (1) On lines 170-172, it is unclear to me how Figure 1E is linked to the text.

      We have added a line better explaining Fig. 1E: “Top 10 contributing HERVs to principal components PC1 and PC2 are shown” (lines 171-172).

      (2) Figure S2: grouping or colouring the plots based on the cluster to which HERVs were assigned could facilitate the understanding of the figure.

      We appreciate the suggestion to enhance the clarity of the figures. However, this color-coding cannot be implemented, as a family is not exclusively assigned to a single cluster.

      (3) How are the 4 HERV clusters of Figure 2 and the 8 modules of Figure 3 related to the clusters identified by hierarchical clustering in Figure 1? More details should be provided in the text (Results and Methods sections), and figures to illustrate the clustering strategy should be added if needed.

      To enhance clarity, we have included the following explanation in the results section (lines 244-251): “To uncover potentially affected physiologic functions linked to DE HERV, we examined how DE HERVs and DE genes with similar expression patterns grouped together in modules based on their intrinsic relationships by their hierarchical co-clustering (Fig. 3). Then, the functional significance of these modules was assessed by gene ontology (GO) analysis of the DE genes within each module. The hierarchical clustering analysis resulted in the identification of eight distinct modules, each characterized by unique combinations of DE HERV and DE gene patterns across all four study groups (Fig. 3)”.

      (4) Related to Figure 4, are there HERV sequences in module 3 located near genes important for plasma cells and/or resting CD4 memory T cells?

      Thank you for your insightful comment. However, gene relevance for plasma cells and/or resting CD4 memory T cells may depend on multiple factors in addition to cell type and subtypes and, therefore, the analysis may not be straight forward.

      Reviewer #2 (Recommendations for the authors):

      In Figure 1, the heatmap scale goes from -4 to 4. This should reflect at least the numbers on the lowest and highest end of the scale.

      Thank you for bringing this to our attention. The scale was correct; however, when arranging the panels, the numbers were not properly positioned. The figure has now been updated with the corrected version.

      Figure 2F and G, percentages are shown as decimal numbers up to 1.00, while it should be 100%, and so on.

      We also replaced this figure, changing the numbers to fit percentages.

      It would be interesting to know how the results change using FDR of 0.05. I'm not familiar with microarray thresholds, but in RNA-Seq, 0.1 is rarely used, with 0.05 being the standard. Could it be that a more stringent result better distinguishes the pathologies?

      Applying a more stringent threshold, such as FDR 0.05, may remove sequences that, while not strongly differentially expressed, may be still important for distinguishing between these pathologies. Therefore, we decided to also include DE tendencies (FDR<0.1) in this first of a kind study. Findings will need validation in enlarged cohorts.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      The authors aimed to investigate the interaction between tissue-resident immune cells (microglia) and circulating systemic neutrophils in response to acute, focal retinal injury. They induced retinal lesions using 488 nm light to ablate photoreceptor (PR) outer segments, then utilized various imaging techniques (AOSLO, SLO, and OCT) to study the dynamics of fluorescent microglia and neutrophils in mice over time. Their findings revealed that while microglia showed a dynamic response and migrated to the injury site within a day, neutrophils were not recruited to the area despite being nearby. Post-mortem confocal microscopy confirmed these in vivo results. The study concluded that microglial activation does not recruit neutrophils in response to acute, focal photoreceptor loss, a scenario common in many retinal diseases.

      Strengths:

      The primary strength of this manuscript lies in the techniques employed.

      In this study, the authors utilized advanced Adaptive Optics Scanning Laser Ophthalmoscopy (AOSLO) to document immune cell interactions in the retina accurately. AOSLO's micron-level resolution and enhanced contrast, achieved through near-infrared (NIR) light and phase-contrast techniques, allowed visualization of individual immune cells without extrinsic dyes. This method combined confocal reflectance, phase-contrast, and fluorescence modalities to reveal various cell types simultaneously. Confocal AOSLO tracked cellular changes with less than 6 μm axial resolution, while phase-contrast AOSLO provided detailed views of vascular walls, blood cells, and immune cells. Fluorescence imaging enabled the study of labeled cells and dyes throughout the retina. These techniques, integrated with conventional histology and Optical Coherence Tomography (OCT), offered a comprehensive platform to visualize immune cell dynamics during retinal inflammation and injury.

      Thank you!

      Weaknesses:

      One significant weakness of the manuscript is the use of Cx3cr1GFP mice to specifically track GFP-expressing microglia. While this model is valuable for identifying resident phagocytic cells when the blood-retinal barrier (BRB) is intact, it is important to note that recruited macrophages also express the same marker following BRB breakdown. This overlap complicates the interpretation of results and makes it difficult to distinguish between the contributions of microglia and infiltrating macrophages, a point that is not addressed in the manuscript.

      We agree that greater emphasis is required that CX3CR1 mice exhibit fluorescence in not only microglia, but also other cells of macrophage origin including monocytes, perivascular macrophages and some hyalocytes.

      Through the advantages of in vivo AOSLO, however, we are able to establish that CX3CR1 cells are present within the tissue before the laser lesion is placed. This suggests they are tissue resident. We agree that it is possible that at later time points (days-weeks), systemic macrophages and/or monocytes may participate. Lack of rolling/crawling cells suggest they are not systemic. We elaborate on this point in a new section in the discussion:

      P29 L534-541:

      “CX3CR1-GFP mice exhibit fluorescence not only in microglia

      We recognize that the CX3CR1-GFP model can also label systemic cells such as monocytes/macrophages77. While it is possible these cells could infiltrate the retina in response to the lesion, we find it unlikely since there was no indication of the leukocyte extravasation cascade (rolling/crawling/stalled cells) within the nearest retinal vasculature. In addition to microglia, retinal perivascular macrophages and hyalocytes also exhibit GFP fluorescence and thus that these cells may also contribute toward damage resolution.”

      Another major concern is the time point chosen for analyzing the neutrophil response. The authors assess neutrophil activity 24 hours after injury, which may be too late to capture the initial inflammatory response. This delayed assessment could overlook crucial early dynamics that occur shortly after injury, potentially impacting the overall findings and conclusions of the study.

      The power of in vivo imaging makes these early assessments possible. Therefore, we have taken the reviewers concern and conducted an additional experiment which examines whether neutrophils are seen in the window of time between lesion and 24hrs. In a newly examined mouse, we find that within 3.5 hours post-lesion, neutrophils do not extravasate adjacent to the lesion site (see new “figure 8 – figure supplement 1”).

      Also see accompanying video (new “figure 8 – video 3”) for an example of nearby neutrophils flowing through OPL capillaries just microns away from the lesion site. Neutrophils are clearly contained within the vasculature and exhibit dynamics consistent with healthy retinal tissue. While it remains possible that the lesion may increase leukocyte stalling within the nearest capillaries, we are unable to confirm or deny this with a single experiment. We now submit this evidence as a new supplementary figure following the reviewer’s suggestion.

      Reviewer #2 (Public review):

      Summary:

      This study uses in vivo multimodal high-resolution imaging to track how microglia and neutrophils respond to light-induced retinal injury from soon after injury to 2 months post-injury. The in vivo imaging finding was subsequently verified by an ex vivo study. The results suggest that despite the highly active microglia at the injury site, neutrophils were not recruited in response to acute light-induced retinal injury.

      Strengths:

      An extremely thorough examination of the cellular-level immune activity at the injury site. In vivo imaging observations being verified using ex vivo techniques is a strong plus.

      We appreciate this recognition and hope that the reviewer considers the weaknesses below in the context of the papers identified strengths.

      Weaknesses:

      This paper is extremely long, and in the perspective of this reviewer, needs to be better organized.

      We agree and have taken the following steps to address this:

      (1) Paper has been shortened overall by 8%

      (2) We reorganized the following sections:

      a. Introduction: shortened

      b. Methods: merged section “Ex vivo confocal image processing” with “Ex vivo confocal imaging”.

      c. Results: most sections shortened, others simplified for concision

      d. Discussion: most sections shortened, removed “Microglial/neutrophil discrimination using label-free phase contrast”

      e. Figure references reorganized in order of their appearance.

      Study weakness: though the finding prompts more questions and future studies, the findings discussed in this paper are potentially important for us to understand how the immune cells respond differently to different severity levels of injury.

      On the heels of this burgeoning technology, we consider this report among the first studies of its kind. We are hopeful that it forms the foundation of many further investigations to come. We expect a rich parameter space to be explored with future studies including investigation of other time points, other injuries of varying degree and other immune cell populations (along with their interactions with each other). Each has the potential to reveal the complexities of the ocular immune system in action.

      Reviewer #3 (Public review):

      Summary:

      This work investigated the immune response in the murine retina after focal laser lesions. These lesions are made with close to 2 orders of magnitude lower laser power than the more prevalent choroidal neovascularization model of laser ablation. Histology and OCT together show that the laser insult is localized to the photoreceptors and spares the inner retina, the vasculature, and the pigment epithelium. As early as 1-day after injury, a loss of cell bodies in the outer nuclear layer is observed. This is accompanied by strong microglial proliferation at the site of injury in the outer retina where microglia do not typically reside. The injury did not seem to result in the extravasation of neutrophils from the capillary network constituting one of the main findings of the paper. The demonstrated paradigm of studying the immune response and potentially retinal remodeling in the future in vivo is valuable and would appeal to a broad audience in visual neuroscience. However, there are some issues with the conclusions drawn from the data and analysis that can be addressed to further bolster the manuscript.

      Strengths:

      Adaptive optics imaging of the murine retina is cutting edge and enables non-destructive visualization of fluorescently labeled cells in the milieu of retinal injury. As may be obvious, this in vivo approach is beneficial for studying fast and dynamic immune processes on a local time scale - minutes and hours, and also for the longer days-to-months follow-up of retinal remodeling as demonstrated in the article. In certain cases, the in vivo findings are corroborated with histology.

      Thank you!

      The analysis is sound and accompanied by stunning video and static imagery. A few different sets of mouse models are used, (a) two different mouse lines, each with a fluorescent tag for neutrophils and microglia, (b) two different models of inflammation - endotoxin-induced uveitis (EAU) and laser ablation are used to study differences in the immune interaction.

      Thank you!

      One of the major advances in this article is the development of the laser ablation model for 'mild' retinal damage as an alternative to the more severe neovascularization models. While not directly shown in the article, this model would potentially allow for controlling the size, depth, and severity of the laser injury opening interesting avenues for future study.

      We agree that there is an established community that is invested in developing titrated dosimetry for light damage models. As the reviewer recognizes, this parameter space is exceptionally large therefore we controlled this parameter by choosing a single wavelength that is commonly used in ophthalmoscopy (488nm), fixed duration and exposure regime that created a reproducible, mild damage of photoreceptors. At this titration we created a mild lesion that spares retina above and below.

      Weaknesses:

      (1) It is unclear based on the current data/study to what extent the mild laser damage phenotype is generalizable to disease phenotypes. The outer nuclear cell loss of 28% and a complete recovery in 2 months would seem quite mild, thus the generalizability in terms of immune-mediated response in the face of retinal remodeling is not certain, specifically whether the key finding regarding the lack of neutrophil recruitment will be maintained with a stronger laser ablation.

      It seems the concern here is whether our finding is generalizable to other damage regimes, especially more severe ones. While speculative, we would suspect that it is not generalizable across different lesions of greater severity. For example, puncturing Bruch’s membrane is an example of a more severe phenotype that is often encountered in laser damage. However, this creates a complicated model that not only induces inflammation, but also compromises BRB integrity and promotes CNV. The parameter space to be tested in the reviewer’s question is quite vast and therefore have tried to summarize the generalizability within our manuscript in

      P31 L586-588 “There are limitations on how generalizable this mild damage to more severe damage or disease phenotypes, but this acute damage model can begin to provide clues about how immune cells interact in response to PR loss. In this laser lesion model, we ablate 27% of the PRs in a 50 µm region.”

      (2) Mice numbers and associated statistics are insufficient to draw strong conclusions in the paper on the activity of neutrophils, some examples are below:

      a) 2 catchup mice and 2 positive control EAU mice are used to draw inferences about immune-mediated activity in response to injury. If the goal was to show 'feasibility' of imaging these mouse models for the purposes of tracking specific cell type behavior, the case is sufficiently made and already published by the authors earlier. It is possible that a larger sample size would alter the conclusion.

      We would like to highlight that the total number of mice studied in this report was 28 (18 in-vivo imaging, 10 ex-vivo histology, >40 lesions total). While power analysis is challenging as these are the first studies of their kind, we underscore that in vivo imaging allows those same mice to be studied multiple times longitudinally. This is not possible with traditional histology. Therefore, in vivo imaging not only reveals the temporal progression (unlike histology), but also increases the number of observations beyond a simple count of the “number of mice”.

      The goal of the study was not one of feasibility. The goal was to address a specific question in ocular biology: “do resident CX3CR1 cells recruit neutrophils in early, regional retinal injury”

      The low numbers that the reviewer points to, are not the primary data of the paper, rather, supportive control data. Moreover, we refocus the attention on the fact that our study is performed on 28 mice across multiple modalities and each corroborates a common finding that neutrophils do not appear to be recruited despite strong microglial response; a central finding of the paper.

      b) There are only 2 examples of extravasated neutrophils in the entire article, shown in the positive control EAU model. With the rare extravasation events of these cells and their high-speed motility, the chance of observing their exit from the vasculature is likely low overall, therefore the general conclusions made about their recruitment or lack thereof are not justified by these limited examples shown.

      The spirit of the challenge raised is that because nothing was seen, is not proof that nothing occurred. Said more commonly, “absence of evidence is not evidence of absence”- a quote often attributed to Carl Sagan. Yet we push back on this conjecture as we have shown, not only with cutting edge in vivo imaging, but also with ample histological controls as well as multiple transgenic animals (and corroborating IHC antibodies) that in none of these imaging modalities, at none of the time points we evaluated, did neutrophils aggregate or extravasate in response to photoreceptor ablation.

      Reviewer adds: “the chance of observing their exit from the vasculature is likely low overall…”

      This is the reason that we specifically chose a focal lesion model to increase any possible chance of imaging a rare event. The focal lesion provides both a time and a location for “where” to look. Small 50 micrometer lesions were sufficient to drive a strong local microglial response (figures 5,6,9). This was evidence that local inflammatory cues were present. Yet despite this activation, neutrophils were not recruited to this location. We emphasize that this is a strength of our approach over other pan-retinal damage models that may indeed miss the rare extravasation events that are geographically sparse and happen over hours.

      c) In Figure 3, the 3-day time point post laser injury shows an 18% reduction in the density of ONL nuclei (p-value of 0.17 compared to baseline). In the case of neutrophils, it is noted that "Control locations (n = 2 mice, 4 z-stacks) had 15 {plus minus} 8 neutrophils per sq.mm of retina whereas lesioned locations (n = 2 mice, 4 z-stacks) had 23 {plus minus} 5 neutrophils per sq.mm of retina (Figure 10b). The difference between control and lesioned groups was not statistically significant (p = 0.19)." These data both come from histology. While the p-values - 0.17 and 0.19 - are similar, in the first case a reduction in ONL cell density is concluded while in the latter, no difference in neutrophil density is inferred in the lesioned case compared to control. Why is there a difference in the interpretation where the same statistical test and methodology are used in both cases? Besides this statistical nuance, is there an alternate possibility that there is an increased, albeit statistically insignificant, concentration of circulating neutrophils in the lesioned model? The increase is nearly 50% (15 {plus minus} 8 vs. 23 {plus minus} 5 neutrophils per sq.mm) and the reader may wonder if a larger animal number might skew the statistic towards significance.

      The statistics and p-values will be dependent on the strategy of analysis performed. As described in the methods, we used a predetermined 50 micron cylinder for our counting analysis based on the average lesion size created. We used this circular window to roughly approximate the size of the common lesion size. However, recall that the damage is created in a single axis (a line projected on the retina) therefore it is possible that the analysis region is too generous to capture the exceptionally local damage.

      While the reviewer is focused on the nuance of statistics, we would like to refocus the conversation on our data that shows that very few neutrophils were observed at all (105 cells from 8 locations, P value reported). But missed in the above critique is that all neutrophils were contained within capillaries (Fig 10). We found no examples of extravasated neutrophils.  This is the major finding and is supported by our in vivo as well as ex vivo confirmation.

      (2) The conclusions on the relative activity of neutrophils and microglia come from separate animals. The reader may wonder why simultaneous imaging of microglia and neutrophils is not shown in either the EAU mice or the fluorescently labeled catchup mice where the non-labeled cell type could possibly be imaged with phase-contrast as has been shown by the authors previously. One might suspect that the microglia dynamics are not substantially altered in these mice compared to the CX3CR1-GFP mice subjected to laser lesions, but for future applicability of this paradigm of in vivo imaging assessment of the laser damage model, including documenting the repeatability of the laser damage model and the immune cell behavior, acquiring these data in the same animals would be critical.

      A double fluorescent mouse (neutrophils and microglia) is a logical next step of this research. In fact, we have now crossed these transgenic mice and are studying this double labeled mouse in a second manuscript in preparation. However, for this study, it was imperative that the fluorescent imaging light was kept at low levels as not to contribute or alter the lesion phenotype and accompanying immune response. Therefore, imaging two fluorescent channels to simultaneously view neutrophils and microglia in the same animal would have required at least 2X the visible light exposure for imaging. The imaging light levels used in the current study were carefully examined in our previous publications as to not create additional light damage (Joseph et al 2021).

      (3) Along the same lines as above, the phase contrast ONL images at time points from 3-day to 2-month post laser injury are not shown and the absence of this data is not addressed. This missing data pertains only to the in vivo imaging mice model but are conducted in histology that adequately conveys the time-course of cell loss in the ONL.

      The ocular preparation of the phase contrast data in figure 2, unfortunately developed an anesthesia induced cataract that precluded adequate image quality. This is not uncommon in long-term mouse ocular imaging preparations (Feng et al 2023). Instead, we chose to include the phase-contrast data to show the visually compelling intact and disrupted ONL damage for baseline and 1 day to show that the damage is not only focal, but also shows clear disruption to the somatic layers of the photoreceptors.

      It is suggested that the reason be elaborated for the exclusion of this data and the simultaneous imaging of microglia and neutrophils mentioned above.

      We agree and we have included the reason for the “not acquired” data within the figure 2 legend:

      “Phase contrast data was not acquired for time points 3 days-2 months due to development of cataract which obscured the phase contrast signal”

      Also, it would be valuable to further qualify and check the claims in the Discussion that "ex vivo analysis confirms in vivo findings" and "Microglial/neutrophil discrimination using label-free phase contrast"

      We maintain that ex vivo analysis both corroborates and in many cases, confirms our in vivo findings. We feel this is a strength of our manuscript rather than a qualifier. A) Damage localization is visible with OCT and confocal/phase contrast AOSLO in a region that matches the DAPI loss we see ex vivo. B) Disruption of the ONL seen with in vivo AOSLO is of the same size, shape and location as the ONL damage quantified ex vivo. C) No damage or disruption was seen in locations above the lesion with OCT or AOSLO, which matches our finding that only the ONL shows loss of nuclei whereas other more superficial layers are spared. D) Microglial localization is found both in vivo and ex vivo and E) lack of neutrophil aggregation or extravasation was neither seen in vivo or ex vivo. Given the evidence above, we contend that this strong synergistic and complementary approach corroborates the experimental data in two ways of studying this tissue.

      We agree that the claims made in the section entitled “Microglial/neutrophil discrimination using label-free phase contrast” are not strongly supported by the phase-contrast imaging presented in this paper. Accordingly, we have since removed this section based on reviewer suggestion.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Based on the title and abstract, the main focus of the manuscript appears to be the immune response. However, most of the manuscript is dedicated to the authors' imaging technique. Additionally, several important concerns regarding the investigation of the immune response in the retina need to be addressed.

      We understand that emphasis may appear to be on the imaging technique, however, because AOSLO is not a widely used technology, we are committed to explaining the technique so that it both builds awareness and confidence in the way this exciting new data is acquired.

      (2) The authors indicate '1 day post-injury' as a timeframe spanning between 18 and 28 hours post-injury. This is a rather wide window of time, which could potentially affect the analysis. It is necessary to demonstrate that there is no significant difference in the immune response, particularly in terms of microglial morphology and branch orientation, between 18 and 28 hours post-injury.

      We agree that a fine time scale may show even greater insight to the natural history of the inflammatory response. However, we feel that our chosen time points go above and beyond the temporal precision that is offered by other investigations, especially considering the novel multi-modal imaging performed here. Studies using finer temporal sampling are poised for future investigation.

      (3) The authors should consider using additional markers or complementary techniques to differentiate between microglia and recruited macrophages, such as incorporating immunohistochemistry with P2RY12, a specific marker for microglia that helps distinguish them from macrophages, and CD68 or F4/80, markers for recruited macrophages. It is also crucial for the authors to include a discussion addressing the limitations of using Cx3cr1GFP mice and the potential impact on result interpretation. It is fundamental to validate the findings and clarify the roles of microglia and macrophages.

      The wonders of current IHC is that there are myriad antibodies and labels that “could” be used. We used what we felt were the most compelling for this stage of early investigation. We look forward to studies that employ this wider range of labels. See our response to reviewer 1’s first comment above for addressing the limitations of using Cx3CR1 mice.

      (4) Analyzing neutrophil responses at 24 hours post-injury may be too late to capture the critical early dynamics of inflammation. By this time, the initial recruitment and activation phases of neutrophils may have already peaked or begun to resolve, potentially missing key insights into the immediate immune response. The authors should conduct additional analysis of neutrophil responses at earlier time points post-injury, such as 6 or 12 hours. Including these time points would provide a more comprehensive and conclusive analysis of the neutrophil response, helping to delineate the progression of inflammation and its implications for subsequent healing processes.

      This point has been addressed above. Briefly, we have now included a new experiment (and figure + video) that shows no neutrophil extravasation at earlier time points. We thank the reviewer for this helpful suggestion.

      Reviewer #2 (Recommendations for the authors):

      This paper is extremely long, and in the perspective of this reviewer, needs to be better organized.

      (1) There was a lengthy description and verification of light-induced injury and longitudinal tracking of healing, which I believe can be further cleaned up and made more succinct.

      We have cleaned-up and re-organized the manuscript (see above response for details). Manuscript has been reorganized and reduced by 8%.

      (2) The intention/goal of the paper can be further strengthened. On page 33: "to what extent do neutrophils respond to acute neural loss in the retina?" This particular statement is so clear and really brings out the purpose of this study, and it will be great to see something like this in the opening statement.

      We thank the reviewer for this excellent suggestion. We have modified the final paragraph of the introduction to strengthen our study’s intention.

      P4 L45-47: Here, we ask the question: “To what extent do microglia/neutrophils respond to acute neural loss in the retina?” To begin unraveling the complexities in this response, we deploy a deep retinal laser ablation model.

      (3) The figures are not mentioned in the manuscript in the order they were numbered. It makes it extremely challenging to follow along. The methods/results sections started with Figure 1, then on to Figure 4, then back to Figures 2 and 3, etc. This reviewer recommends re-organizing figures and their order of appearance so the contents of the figures are referred to in the paragraph in the most efficient and clear manner.

      We have re-organized the appearance of figure references throughout the paper.

      (4) Figure 2: phase contrast was not acquired on days 3, 7, and 2 months. Please briefly explain the reason in the caption.

      Addressed above.

      (5) Figure 4 OPL layer, the area highlighted in a dashed circle was meant to demonstrate that perfusion was intact, but I cannot see the flow in the highlighted area very well at day 7 and 2 months (especially 2 months). Please explain.

      Perfusion maps are often difficult to interpret as a static image. Therefore, we have additionally provided the raw video data (“OPL_vasculature_7d” and “OPL_vasculature_2mo”) which helps visualize active perfusion. To the reviewer’s point, videos reveal that RBC motion is maintained in the capillaries of this location.

      (6) While there's a thorough discussion of the biological impact of the finding, the uniqueness of the imaging technique can be better highlighted. Immune response toward injury is highly dynamic and is often the first step of wound healing. To observe such dynamic events longitudinally in the living eye at the cellular level, it requires a special imaging technique such as the type addressed here. The author can better address the technical uniqueness of studying this type of biological event for readers less familiar with AOSLO.

      We agree and following the reviewer’s suggestion have further emphasized the advance in the current manuscript in two additional places:

      (1) Within the introduction

      P3-4 L21-42: “A missed window of interaction is highly problematic in histological study where a single time point reveals a snapshot of the temporally complex immune response, which changes dynamically over time. Here, we use in vivo imaging to overcome these constraints.

      Documenting immune cell interactions in the retina over time has been challenged by insufficient resolution and contrast to visualize single cells in the living eye. The microscopic size of immune cells requires exceptional resolution for detection. Recently, advances in AOSLO imaging have provided micron-level resolution and enhanced contrast for imaging individual immune cells in the retina and without requiring extrinsic dyes(7,23). AOSLO provides multi-modal information from confocal reflectance, phase-contrast and fluorescence modalities, which can reveal a variety of cell types simultaneously in the living eye. Here, we used confocal AOSLO to track changes in reflectance at cellular scale. Phase-contrast AOSLO provides detail on highly translucent retinal structures such as vascular wall, single blood cells(27–29), PR somata(30), and is well-suited to image resident and systemic immune cells.(7,23) Fluorescence AOSLO provides the ability to study fluorescently-labeled cells(25,31,32) and exogenous dyes(27,33) throughout the living retina. These modalities used in combination have recently provided detailed images of the retinal response to a model of human uveitis.(23,34) Together, these innovations now provide a platform to visualize, for the first time, the dynamic interplay between many immune cell types, each with a unique role in tissue inflammation.”

      (2) Within the discussion

      P34-35 L656-662 “Beyond the context of this specific finding, we share this work with the excitement that AOSLO cellular level imaging may reveal the interaction of multiple immune cell types in the living retina. By using fluorophores associated with specific immune cell populations, the complex dynamics that orchestrate the immune response may be examined in this specialized tissue. This work and future studies may reveal further insights to the interactions of single immune cells in the living body in a non-invasive way.”

      Reviewer #3 (Recommendations for the authors):

      Some other comments:

      (1) The reader may wonder why if all findings are confirmed by histology would an in vivo imaging model be needed. This does not need a generalized explanation given the typical virtues of an in vivo model, but perhaps the authors may want to amplify their findings in the current context, for example, those on the shorter minutes to hours timescales (Figure 2, Supplement 1) that would have been resource and time intensive, and likely impossible, to gather via histology alone.

      The reviewer appropriately underscores the utility of in vivo imaging above histological-only investigation. In response, we have added text in the introduction to emphasize the nuanced, but important value of both longitudinal imaging as well as dynamic imaging which is not possible with conventional histology (e.g. blood perfusion status, immune cell interactions etc.)

      P3-4 L21-42 (these points also addressed in response to reviewer #2 above)

      (2) A few questions and comments on the laser ablation model<br /> - It is alluded to in the Discussion in Lines 519-521 that the procedure is highly reproducible (95%) but the associated data for this repeatability metric is not shown.

      We agree that the criterion for determining a “successful lesion” requires further elaboration. Therefore, we have now included the criteria for successful lesions in the methods as well as discussion (in bullet below):

      Methods:

      P9-10 L129-133: “This protocol produced a hyper-reflective phenotype in the >40 locations across 28 mice. In rare cases, the exposure yielded no hyper-reflective lesion and were often in mice with high retinal motion, where the light dosage was spread over a larger retinal area. These locations were not included in the in-vivo or histological analysis.”

      - The methods state that a 24 x 1-micron line is focused on the retina, but all lesions seem to appear elliptical where the major to minor axis ratio is a lot smaller than this intended size. One wonders what leads to this discrepancy.

      We expect that this observation is related to the response above, we have added the following:

      Discussion:

      P27 L497-505: “The damage took on an elliptical form, likely due to: 1) Eye motion from respiration and heart rate which spreads the light over a larger integrative area (rather than line). 2) The impact of focal light scatter. 3) A micron-thin line imparting damage on cells that are many microns across manifesting as an ellipse. The majority of light exposures produced lesions of this elliptical shape. In a few conditions, for the reasons described above, the exposure failed to produce a strong, focal damage phenotype. To improve lesion reproducibility, future experiments should control for subtle eye motion affecting light damage, especially for long exposures.”

      (3) Lastly, a thickening is noted in the ONL after laser injury that seems to cause a thinning of the INL as well (Figure 3) which may increase the apparent INL nuclei density.

      The reviewer’s careful eye finds local swelling after injury. However, despite swelling, the segregation between INL and ONL was maintained in all days we examined. Thus, no ONL cells were included in INL counts (see figure 3A & 3D).

      Also, the ONL - inner (panel B) seems to show a little reduction in cell density in the same elliptical shape as the outer ONL in panel C.

      We agree with this observation and was one of the reasons we included this detailed analysis of both the inner and outer half of the ONL. Our finding is that there is more prominent loss of nuclei in the outer half of the ONL. While the mechanism for this is not understood, we felt it was an important finding to include and further shows the axial specificity of the light damage we are inducing (especially at day 1 observation).

      Lastly, the reduction in nuclear density is visually obvious in the ONL at the 1 and 3-day time points but the p-statistic does not seem to convey this. One may consider performing the analysis on panel F on a smaller region surrounding the lesion to more reliably reveal these effects.

      Related to the response above, the ONL shows a persistence of nuclei in the upper half of that layer, whereas the outer half, shows a visible reduction. Therefore, we expect that the reviewer is correct that a statistical analysis that considers just the outer half of the ONL would likely show a strong statistical significance. The challenge, however, is that our analysis strategy counted all cells within a 50 micron diameter cylinder through the entirety of the ONL (meaning strong loss in the outer half was attenuated by weak loss in the inner half). A more detailed sub-layer analysis is challenging given the notable retinal remodeling over days-to-weeks that make it challenging to attribute layers within the ONL as viable landmarks for the requested analysis.

      (4) In Figure 6, the NIR confocal image and fluorescent microglia seem to share the same shape, starting from the OPL and posterior to it. This is particularly evident in the 3 and 7-day time points in the ONL and ONL/IS images. This departs from lines 567-577 where the claim is made that the hyperreflective phenotype in NIR images does not emerge from the microglia and neutrophils. This discrepancy should be clarified. It may be so that the hyperreflective phenotype as observed by Figure 2 at shorter timescales is not related to the microglia but the locus of hyper-reflections changes at longer time scales to involve the microglia as well as in Figure 6. One potential clue/speculation of the common shapes/size in confocal hyper-reflectance and fluorescent microglia of Figure 6 comes from Figure 9 where the microglia seem to engulf the photoreceptor phagosomes in the DAPI stains. It is possible that the hyper-reflections arise from the phagosomes but their co-localization with microglia seems to demonstrate a shared size/shape. As an addendum to the first point, such correlations are a power of the in vivo model and impossible to achieve in histology.

      The reviewer shows a deep understanding of our data. We agree with many of the points, but for the purpose of the paper many of the above offerings are speculative and we have chosen not to elaborate on these points as it is not definitive from the data. Instead, we direct the reader to an important finding that within hours, the hyper-reflective phenotype is seen in both OCT and AOSLO, whereas microglial somas/processes have not yet migrated into the hyper-reflective region. We have now emphasized this point in the discussion section:

      P29-30 L543-552: “A common speculation is that the increased backscatter may arise from local inflammatory cells that activate or move into the damage location. In our data, confocal AOSLO and OCT revealed a hyperreflective band at the OPL and ONL after 488 nm light exposure (Figure 2a, b). We found that the hyperreflective bands appeared within 30 minutes after the laser injury, preceding any detectable microglial migration toward the damage location (Figure 2 – figure supplement 1 and Figure 6 – figure supplement 1). We thus conclude that the initial hyperreflective phenotype is not caused by microglial cell activity or aggregation.”

    1. Author response:

      The following is the authors’ response to the previous reviews

      eLife Assessment

      This work presents a valuable self-supervised method for the segmentation of 3D cells in microscopy images, alongside an implementation as a Napari plugin and an annotated dataset. While the Napari plugin is readily applicable and promises to eliminate time consuming data labeling to speed up quantitative analysis, there is incomplete evidence to support the claim that the segmentation method generalizes to other light-sheet microscopy image datasets beyond the two specific ones used here.

      Technical Note: We showed the utility of CellSeg3D in the first submission and in our revision on 5 distinct datasets; 4 of which we showed F1-Score performance on. We do not know which “two datasets” are referenced. We also already showed this is not limited to LSM, but was used on confocal images; we already limited our scope and changed the title in the last rebuttal, but just so it’s clear, we also benchmark on two non-LSM datasets.

      In this revision, we have now additionally extended our benchmarking of Cellpose and StarDrist on all 4 benchmark datasets, where our Wet3D (our novel contribution of a self-supervised model) outperforms or matches these supervised baselines. Moreover, we perform rigorous testing of our model’s generalization by training on one dataset and testing generalization to the other 3; we believe this is on par (or beyond) what most cell segmentation papers do, thus we hope that “incomplete” can now be updated.

      Public Reviews:

      Reviewer #1 (Public review):

      This work presents a self-supervised method for the segmentation of 3D cells in microscopy images, an annotated dataset, as well as a napari plugin. While the napari plugin is potentially useful, there is insufficient evidence in the manuscript to support the claim that the proposed method is able to segment cells in other light-sheet microscopy image datasets than the two specific ones used here.

      Thank you again for your time. We benchmarked already on four datasets the performance of WNet3Dd (our 3D SSL contribution) - thus, we do not know which two you refer to. Moreover, we now additionally benchmarked Cellpose and StarDist on all four so readers can see that on all datasets, WNet3D outperforms or matches these supervised methods.

      I acknowledge that the revision is now more upfront about the scope of this work. However, my main point still stands: even with the slight modifications to the title, this paper suggests to present a general method for self-supervised 3D cell segmentation in light-sheet microscopy data. This claim is simply not backed up.

      We respectfully disagree; we benchmark on four 3D datasets: three curated by others and used in learning ML conference proceedings, and one that we provide that is a new ground truth 3D dataset - the first of its kind - on mesoSPIM-acquired brain data. We believe benchmarking on four datasets is on par (or beyond) with current best practices in the field. For example, Cellpose curated one dataset and tested on held-out test data on this one dataset (https://www.nature.com/articles/s41592-020-01018-x) and benchmarked against StarDist and Mask R-CNN (two models). StarDist (Star-convex Polyhedra for 3D Object Detection and Segmentation in Microscopy) benchmarked on two datasets and against two models, IFT-Watershed and 3D U-Net. Thus, we feel our benchmarking on more models and more datasets is sufficient to claim our model and associated code is of interest to readers and supports our claims (for comparison, Cellpose’s title is “Cellpose: a generalist algorithm for cellular segmentation”, which is much broader than our claim).

      I still think the authors should spell out the assumptions that underlie their method early on (cells need to be well separated and clearly distinguishable from background). A subordinate clause like "often in cleared neural tissue" does not serve this purpose. First, it implies that the method is also suitable for non-cleared tissue (which would have to be shown). Second, this statement does not convey the crucial assumptions of well separated cells and clear foreground/background differences that the method is presumably relying on.

      We expanded the manuscript now quite significantly. To be clear, we did show our method works on non-cleared tissue; the Mouse Skull, 3D platynereis-Nuclei, and 3D platynereis-ISH-Nuclei is not cleared tissue, and not all with LSM, but rather with confocal microscopy. We attempted to make that more clear in the main text.

      Additionally, we do not believe it needs to be well separated and have a perfectly clean background. While we removed statements like "often in cleared neural tissue", expanded the benchmarking, and added a new demo figure for the readers to judge. As in the last rebuttal, we provide video-evidence (https://www.youtube.com/watch?v=U2a9IbiO7nE) of the WNet3D working on the densely packed and hard to segment by a human, Mouse Skull dataset and linked this directly in the figure caption.

      We have re-written the main manuscript in an attempt to clarify the limitations, including a dedicated “limitations” section. Thank you for the suggestion.

      It does appear that the proposed method works very well on the two investigated datasets, compared to other pre-trained or fine-tuned models. However, it still remains unclear whether this is because of the proposed method or the properties of those specific datasets (namely: well isolated cells that are easily distinguished from the background). I disagree with the authors that a comparison to non-learning methods "is unnecessary and beyond the scope of this work". In my opinion, this is exactly what is needed to proof that CellSeg3D's performance can not be matched with simple image processing.

      We want to again stress we benchmarked WNet3D on four datasets, not two. But now additionally added benchmarking with Cellpose, StarDist and a non-deep learning method as requested (see new Figures 1 and 3).

      As I mentioned in the original review, it appears that thresholding followed by connected component analysis already produces competitive segmentations. I am confused about the authors' reply stating that "[this] is not the case, as all the other leading methods we fairly benchmark cannot solve the task without deep learning". The methods against which CellSeg3D is compared are CellPose and StarDist, both are deep-learning based methods.

      That those methods do not perform well on this dataset does not imply that a simpler method (like thresholding) would not lead to competitive results. Again, I strongly suggest the authors include a simple, non-learning based baseline method in their analysis, e.g.: * comparison to thresholding (with the same post-processing as the proposed method) * comparison to a normalized cut segmentation (with the same post-processing as the proposed method)

      We added a non-deep learning based approach, namely, comparing directly to thresholding with the same post hoc approach we use to go from semantic to instance segmentation. WNet3D (and other deep learning approaches) perform favorably (see Figure 2 and 3).

      Regarding my feedback about the napari plugin, I apologize if I was not clear. The plugin "works" as far as I tested it (i.e., it can be installed and used without errors). However, I was not able to recreate a segmentation on the provided dataset using the plugin alone (see my comments in the original review). I used the current master as available at the time of the original review and default settings in the plugin.

      We updated the plugin and code for the revision at your request to make this possible directly in the napari GUI in addition to our scripts and Jupyter Notebooks (please see main and/or `pip install --upgrade napari-cellseg3d`’ the current is version 0.2.1). Of course this means the original submission code (May 2024) will not have this in the GUI so it would require you to update to test this. Alternatively, you can see the demo video we now provide for ease: https://www.youtube.com/watch?v=U2a9IbiO7nE (we understand testing code takes a lot of time and commitment).

      We greatly thank the review for their time, and we hope our clarifications, new benchmarking, and re-write of the paper now makes them able to change their assessment from incomplete to a more favorable and reflective eLife adjective.

      Reviewer #2 (Public review):

      Summary:

      The authors propose a new method for self-supervised learning of 3d semantic segmentation for fluorescence microscopy. It is based on a WNet architecture (Encoder / Decoder using a UNet for each of these components) that reconstructs the image data after binarization in the bottleneck with a soft n-cuts clustering. They annotate a new dataset for nucleus segmentation in mesoSPIM imaging and train their model on this dataset. They create a napari plugin that provides access to this model and provides additional functionality for training of own models (both supervised and self-supervised), data labeling and instance segmentation via post-processing of the semantic model predictions. This plugin also provides access to models trained on the contributed dataset in a supervised fashion.

      Strengths:

      -  The idea behind the self-supervised learning loss is interesting.

      -  It provides a new annotated dataset for an important segmentation problem.

      -  The paper addresses an important challenge. Data annotation is very time-consuming for 3d microscopy data, so a self-supervised method that yields similar results to supervised segmentation would provide massive benefits.

      -  The comparison to other methods on the provided dataset is extensive and experiments are reproducible via public notebooks.

      Weaknesses:

      The experiments presented by the authors support the core claims made in the paper. However, they do not convincingly prove that the method is applicable to segmentation problems with more complex morphologies or more crowded cells/nuclei.

      Major weaknesses:

      (1) The method only provides functionality for semantic segmentation outputs and instance segmentation is obtained by morphological post-processing. This approach is well known to be of limited use for segmentation of crowded objects with complex morphology. This is the main reason for prediction of additional channels such as in StarDist or CellPose. The experiments do not convincingly show that this limitation can be overcome as model comparisons are only done on a single dataset with well separated nuclei with simple morphology. Note that the method and dataset are still a valuable contribution with this limitation, which is somewhat addressed in the conclusion. However, I find that the presentation is still too favorable in terms of the presentation of practical applications of the method, see next points for details.

      Thank you for noting the methods strengths and core features. Regarding weaknesses, we have revised the manuscript again and added direct benchmarking now on four datasets and a fifth “worked example” (https://www.youtube.com/watch?v=3UOvvpKxEAo&t=4s) in a new Figure 4.

      We also re-wrote the paper to more thoroughly present the work (previously we adhered to the “Brief Communication” eLife format), and added an explicit note in the results about model assumptions.

      (2) The experimental set-up for the additional datasets seems to be unrealistic as hyperparameters for instance segmentation are derived from a grid search and it is unclear how a new user could find good parameters in the plugin without having access to already annotated ground-truth data or an extensive knowledge of the underlying implementations.

      We agree that of course with any self-supervised method the user will need a sense of what a good outcome looks like; that is why we provide Google Colab Notebooks

      (https://github.com/AdaptiveMotorControlLab/CellSeg3D/tree/main/notebooks) and the napari-plugin GUI for extensive visualization and even the ability to manually correct small subsets of the data and refine the WNet3D model.

      We attempted to make this more clear with a new Figure 2 and additional functionality directly into the plugin (such as the grid search). But, we believe this “trade-off” for SSL approaches over very labor intensive 3D labeling is often worth it; annotators are also biased so extensive checking of any GT data is equally required.

      We also added the “grid search” functionality in the GUI (please `pip install --upgrade napari-cellseg3d`; the latest v0.2.1) to supplement the previously shared Notebook (https://github.com/C-Achard/cellseg3d-figures/blob/main/thresholds_opti/find_best_threshold s.ipynb) and added a new YouTube video: https://www.youtube.com/watch?v=xYbYqL1KDYE.

      (3) Obtaining segmentation results of similar quality as reported in the experiments within the napari plugin was not possible for me. I tried this on the "MouseSkull" dataset that was also used for the additional results in the paper.

      Again we are sorry this did not work for you, but we added new functionality in the GUI and made a demo video (https://www.youtube.com/watch?v=U2a9IbiO7nE) where you either update your CellSeg3D code or watch the video to see how we obtained these results.

      Here, I could not find settings in the "Utilities->Convert to instance labels" widget that yielded good segmentation quality and it is unclear to me how a new user could find good parameter settings. In more detail, I cannot use the "Voronoi-Otsu" method due to installation issues that are prohibitive for a non expert user and the "Watershed" segmentation method yields a strong oversegmentation.

      Sorry to hear of the installation issue with Voronoi-Otsu; we updated the documentation and the GUI to hopefully make this easier to install. While we do not claim this code is for beginners, we do aim to be a welcoming community, thus we provide support on GitHub, extensive docs, videos, the GUI, and Google Colab Notebooks to help users get started.

      Comments on revised version

      Many of my comments were addressed well:

      -  It is now clear that the results are reproducible as they are well documented in the provided notebooks, which are now much more prominently referenced in the text.

      Thanks!

      -  My concerns about an unfair evaluation compared to CellPose and StarDist were addressed. It is now clear that the experiments on the mesoSPIM dataset are extensive and give an adequate comparison of the methods.

      Thank you; to note we additionally added benchmarking of Cellpose and StarDist on the three additional datasets (for R1), but hopefully this serves to also increase your confidence in our approach.

      -  Several other minor points like reporting of the evaluation metric are addressed.

      I have changed my assessment of the experimental evidence to incomplete/solid and updated the review accordingly. Note that some of my main concerns with the usability of the method for segmentation tasks with more complex morphology / more crowded cells and with the napari plugin still persist. The main points are (also mentioned in Weaknesses, but here with reference to the rebuttal letter):

      - Method comparison on datasets with more complex morphology etc. are missing. I disagree that it is enough to do this on one dataset for a good method comparison.

      We benchmarked WNet3D (our contribution) on four datasets, and to aid the readers we additionally now added Cellpose and StarDist benchmarking on all four. WNet3D performs favorably, even on the crowded and complex Mouse Skull data. See the new Figure 3 as well as the associated video: https://www.youtube.com/watch?v=U2a9IbiO7nE&t=1s.

      -  The current presentation still implies that CellSeg3d **and the napari plugin** work well for a dataset with complex nucleus morphology like the Mouse Skull dataset. But I could not get this to work with the napari plugin, see next points.

      - First, deriving hyperparameters via grid search may lead to over-optimistic evaluation results. How would a user find these parameters without having access to ground-truth? Did you do any experiments on the robustness of the parameters?

      -  In my own experiments I could not do this with the plugin. I tried this again, but ran into the same problems as last time: pyClesperanto does not work for me. The solution you link requires updating openCL drivers and the accepted solution in the forum post is "switch to a different workstation".

      We apologize for the confusion here; the accepted solution (not accepted by us) was user specific as they switched work stations and it worked, so that was their solution. Other comments actually solved the issue as well. For ease this package can be installed on Google Colab (here is the link from our repo for ease: https://colab.research.google.com/github/AdaptiveMotorControlLab/CellSeg3d/blob/main/not ebooks/Colab_inference_demo.ipynb) where pyClesperanto can be installed via: !pip install pyclesperanto-prototype without issue on Google Colab.

      This a) goes beyond the time I can invest for a review and b) is unrealistic to expect computationally inexperienced users to manage. Then I tried with the "watershed" segmentation, but this yields a strong oversegmentation no matter what I try, which is consistent with the predictions that look like a slightly denoised version of the input images and not like a proper foreground-background segmentation. With respect to the video you provide: I would like to see how a user can do this in the plugin without having a prior knowledge on good parameters or just pasting code, which is again not what you would expect a computationally unexperienced user to do.

      We agree with the reviewer that the user needs domain knowledge, but we never claim our method was for inexperienced users. Our main goal was to show a new computer vision method with self-supervised learning (WNet3D) that works on LSM and confocal data for cell nuclei. To this end, we made you a demo video to show how a user can visually perform a thresholding check https://www.youtube.com/watch?v=xYbYqL1KDYE&t=5s, and we added all of these new utilities to the GUI, thanks for the suggestion. Otherwise, the threshold can also be done in a Notebook (as previously noted).

      I acknowledge that some of these points are addressed in the limitations, but the text still implies that it is possible to get good segmentation results for such segmentation problems: "we believe that our self-supervised semantic segmentation model could be applied to more challenging data as long as the above limitations are taken into account." From my point of view the evidence for this is still lacking and would need to be provided by addressing the points raised above for me to further raise the Incomplete/solid rating, especially showing how this can be done wit the napari plugin. As an alternative, I would also consider raising it if the claims are further reduced and acknowledge that the current version of the method is only a good method for well separated nuclei.

      We hope our new benchmarking and clear demo on four datasets helps improve your confidence in our evidence in our approach. We also refined our over text and hope our contributions, the limitations and the advantages are now more clear.

      I understand that this may be frustrating, but please put yourself in the role of a new reader of this work: the impression that is made is that this is a method that can solve 3D segmentation tasks in light-sheet microscopy with unsupervised learning. This would be a really big achievement! The wording in the limitation section sounds like strategic disclaimers that imply that it is still possible to do this, just that it wasn't tested enough.

      But, to the best of my assessment, the current version of the method only enables the more narrow case of well separated nuclei with a simple morphology. This is still a quite meaningful achievement, but more limited than the initial impression. So either the experimental evidence needs to be improved, including a demonstration how to achieve this in practice, including without deriving parameters via grid-search and in the plugin, or the claim needs to be meaningfully toned down.

      Thanks for raising this point; we do think that WNet3D and the associated CellSeg3D package - aimed to continue to integrate state of the art models, is a non-trivial step forward. Have we completely solved the problem, certainly not, but given the limited 3D cell segmentation tools that exist, we hope this, coupled with our novel 3D dataset, pushes the field forward. We don’t show it works on the narrow well-separated use case, but rather show this works even better than supervised models on the very challenging benchmark Mouse Skull. Given we now show evidence that we outperform or match supervised algorithms with an unsupervised approach, we respectfully do think this is a noteworthy achievement. Thank you for your time in assessing our work.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study addresses the question of how task-relevant sensory information affects activity in the motor cortex. The authors use various approaches to address this question, looking at single units and population activity. They find that there are three subtypes of modulation by sensory information at the single unit level. Population analyses reveal that sensory information affects the neural activity orthogonally to motor output. The authors then compare both single unit and population activity to computational models to investigate how encoding of sensory information at the single unit level is coordinated in a network. They find that an RNN that displays similar orbital dynamics and sensory modulation to the motor cortex also contains nodes that are modulated similarly to the three subtypes identified by the single unit analysis.

      Strengths:

      The strengths of this study lie in the population analyses and the approach of comparing single-unit encoding to population dynamics. In particular, the analysis in Figure 3 is very elegant and informative about the effect of sensory information on motor cortical activity.

      The task is also well designed to suit the questions being asked and well controlled.

      We appreciate these kind comments.

      It is commendable that the authors compare single units to population modulation. The addition of the RNN model and perturbations strengthen the conclusion that the subtypes of individual units all contribute to the population dynamics. However, the subtypes (PD shift, gain, and addition) are not sufficiently justified. The authors also do not address that single units exhibit mixed modulation, but RNN units are not treated as such.

      We’re sorry that we didn’t provide sufficient grounds to introduce the subtypes. We have updated this in the revised manuscript, in Lines 102-104 as:

      “We determined these modulations on the basis of the classical cosine tuning model (Georgopoulos et al., 1982) and several previous studies (Bremner and Andersen, 2012; Pesaran et al., 2010; Sergio et al., 2005).”

      In our study, we applied the subtype analysis as a criterion to identify the modulation in neuron populations, rather than sorting neurons into exclusively different cell types.

      Weaknesses:

      The main weaknesses of the study lie in the categorization of the single units into PD shift, gain, and addition types. The single units exhibit clear mixed selectivity, as the authors highlight. Therefore, the subsequent analyses looking only at the individual classes in the RNN are a little limited. Another weakness of the paper is that the choice of windows for analyses is not properly justified and the dependence of the results on the time windows chosen for single-unit analyses is not assessed. This is particularly pertinent because tuning curves are known to rotate during movements (Sergio et al. 2005 Journal of Neurophysiology).

      In our study, the mixed selectivity or specifically the target-motion modulation on reach- direction tuning is a significant feature of the single neurons. We categorized the neurons into three subclasses, not intending to claim their absolute cell types, but meaning to distinguish target-motion modulation patterns. To further characterize these three patterns, we also investigated their interaction by perturbing connection weights in RNN.

      Yes, it’s important to consider the role of rotating tuning curves in neural dynamics during interception. In our case, we observed population neural state with sliding windows, and we focused on the period around movement onset (MO) due to the unexpected ring-like structure and the highest decoding accuracy of transferred decoders (Figure S7C). Then, the single-unit analyses were implemented.

      This paper shows sensory information can affect motor cortical activity whilst not affecting motor output. However, it is not the first to do so and fails to cite other papers that have investigated sensory modulation of the motor cortex (Stavinksy et al. 2017 Neuron, Pruszynski et al. 2011 Nature, Omrani et al. 2016 eLife). These studies should be mentioned in the Introduction to capture better the context around the present study. It would also be beneficial to add a discussion of how the results compare to the findings from these other works.

      Thanks for the reminder. We’ve introduced these relevant researches in the updated manuscript in Lines 422-426 as:

      “To further clarify, the discussing target-motion effect is different from the sensory modulation in action selection (Cisek and Kalaska, 2005), motor planning (Pesaran et al., 2006), visual replay and somatosensory feedback (Pruszynski et al., 2011; Stavisky et al., 2017; Suway and Schwartz, 2019; Tkach et al., 2007), because it occurred around movement onset and in predictive control trial-by-trial.”

      This study also uses insights from single-unit analysis to inform mechanistic models of these population dynamics, which is a powerful approach, but is dependent on the validity of the single-cell analysis, which I have expanded on below.

      I have clarified some of the areas that would benefit from further analysis below:

      (1) Task:

      The task is well designed, although it would have benefited from perhaps one more target speed (for each direction). One monkey appears to have experienced one more target speed than the others (seen in Figure 3C). It would have been nice to have this data for all monkeys.

      A great suggestion; however, it is hardly feasible as the Utah arrays have already been removed.

      (2) Single unit analyses:

      In some analyses, the effects of target speed look more driven by target movement direction (e.g. Figures 1D and E). To confirm target speed is the main modulator, it would be good to compare how much more variance is explained by models including speed rather than just direction. More target speeds may have been helpful here too.

      A nice suggestion. The fitting goodness of the simple model (only movement direction) is much worse than the complex models (including target speed). We’ve updated the results in the revised manuscript in Lines 119-122, as “We found that the adjusted R2 of a full model (0.55 ± 0.24, mean ± sd.) can be higher than that of the PD shift (0.47 ± 0.24), gain (0.46 ± 0.22), additive (0.41 ± 0.26), and simple models (only reach direction, 0.34 ± 0.25) for three monkeys (1162 neurons, ranksum test, one-tailed, p<0.01, Figure S5).”

      The choice of the three categories (PD shift, gain addition) is not completely justified in a satisfactory way. It would be nice to see whether these three main categories are confirmed by unsupervised methods.

      A good point. It is a pity that we haven’t found an appropriate unsupervised method.

      The decoder analyses in Figure 2 provide evidence that target speed modulation may change over the trial. Therefore, it is important to see how the window considered for the firing rate in Figure 1 (currently 100ms pre - 100ms post movement onset) affects the results.

      Thanks for the suggestion and close reading. Because the movement onset (MO) is the key time point of this study, we colored this time period in Figure 1 to highlight the perimovement neuronal activity.

      (3) Decoder:

      One feature of the task is that the reach endpoints tile the entire perimeter of the target circle (Figure 1B). However, this feature is not exploited for much of the single-unit analyses. This is most notable in Figure 2, where the use of a SVM limits the decoding to discrete values (the endpoints are divided into 8 categories). Using continuous decoding of hand kinematics would be more appropriate for this task.

      This is a very reasonable suggestion. In the revised manuscript, we’ve updated the continuous decoding results with support vector regression (SVR) in Figure S7A and in Lines 170-173 as:

      “These results were stable on the data of the other two monkeys and the pseudopopulation of all three monkeys (Figure S6) and reconfirmed by the continuous decoding results with support vector regressions (Figure S7A), suggesting that target motion information existed in M1 throughout almost the entire trial.”

      (4) RNN:

      Mixed selectivity is not analysed in the RNN, which would help to compare the model to the real data where mixed selectivity is common. Furthermore, it would be informative to compare the neural data to the RNN activity using canonical correlation or Procrustes analyses. These would help validate the claim of similarity between RNN and neural dynamics, rather than allowing comparisons to be dominated by geometric similarities that may be features of the task. There is also an absence of alternate models to compare the perturbation model results to.

      Thank you for these helpful suggestions. We have performed decoding analysis on RNN units and updated in Figure S12A and Lines 333-334 as: “First, from the decoding result, target motion information existed in nodes’ population dynamics shortly after TO (Figure S12A).”

      We also have included the results of canonical correlation analysis and Procrustes analysis in Table S2 and Lines 340-342 as: “We then performed canonical component analysis (CCA) and Procrustes analysis (Table S2; see Methods), the results also indicated the similarity between network dynamics and neural dynamics.”

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Zhang et al. examine neural activity in the motor cortex as monkeys make reaches in a novel target interception task. Zhang et al. begin by examining the single neuron tuning properties across different moving target conditions, finding several classes of neurons: those that shift their preferred direction, those that change their modulation gain, and those that shift their baseline firing rates. The authors go on to find an interesting, tilted ring structure of the neural population activity, depending on the target speed, and find that (1) the reach direction has consistent positioning around the ring, and (2) the tilt of the ring is highly predictive of the target movement speed. The authors then model the neural activity with a single neuron representational model and a recurrent neural network model, concluding that this population structure requires a mixture of the three types of single neurons described at the beginning of the manuscript.

      Strengths:

      I find the task the authors present here to be novel and exciting. It slots nicely into an overall trend to break away from a simple reach-to-static-target task to better characterize the breadth of how the motor cortex generates movements. I also appreciate the movement from single neuron characterization to population activity exploration, which generally serves to anchor the results and make them concrete. Further, the orbital ring structure of population activity is fascinating, and the modeling work at the end serves as a useful baseline control to see how it might arise.

      Thank you for your recognition of our work.

      Weaknesses:

      While I find the behavioral task presented here to be excitingly novel, I find the presented analyses and results to be far less interesting than they could be. Key to this, I think, is that the authors are examining this task and related neural activity primarily with a singleneuron representational lens. This would be fine as an initial analysis since the population activity is of course composed of individual neurons, but the field seems to have largely moved towards a more abstract "computation through dynamics" framework that has, in the last several years, provided much more understanding of motor control than the representational framework has. As the manuscript stands now, I'm not entirely sure what interpretation to take away from the representational conclusions the authors made (i.e. the fact that the orbital population geometry arises from a mixture of different tuning types). As such, by the end of the manuscript, I'm not sure I understand any better how the motor cortex or its neural geometry might be contributing to the execution of this novel task.

      This paper shows the sensory modulation on motor tuning in single units and neural population during motor execution period. It’s a pity that the findings were constrained in certain time windows. We are still working on this task, please look forward to our following work.

      Main Comments:

      My main suggestions to the authors revolve around bringing in the computation through a dynamics framework to strengthen their population results. The authors cite the Vyas et al. review paper on the subject, so I believe they are aware of this framework. I have three suggestions for improving or adding to the population results:

      (1) Examination of delay period activity: one of the most interesting aspects of the task was the fact that the monkey had a random-length delay period before he could move to intercept the target. Presumably, the monkey had to prepare to intercept at any time between 400 and 800 ms, which means that there may be some interesting preparatory activity dynamics during this period. For example, after 400ms, does the preparatory activity rotate with the target such that once the go cue happens, the correct interception can be executed? There is some analysis of the delay period population activity in the supplement, but it doesn't quite get at the question of how the interception movement is prepared. This is perhaps the most interesting question that can be asked with this experiment, and it's one that I think may be quite novel for the field--it is a shame that it isn't discussed.

      It’s a great idea! We are on the way, and it seems promising.

      (2) Supervised examination of population structure via potent and null spaces: simply examining the first three principal components revealed an orbital structure, with a seemingly conserved motor output space and a dimension orthogonal to it that relates to the visual input. However, the authors don't push this insight any further. One way to do that would be to find the "potent space" of motor cortical activity by regression to the arm movement and examine how the tilted rings look in that space (this is actually fairly easy to see in the reach direction components of the dPCA plot in the supplement--the rings will be highly aligned in this space). Presumably, then, the null space should contain information about the target movement. dPCA shows that there's not a single dimension that clearly delineates target speed, but the ring tilt is likely evident if the authors look at the highest variance neural dimension orthogonal to the potent space (the "null space")-this is akin to PC3 in the current figures, but it would be nice to see what comes out when you look in the data for it.

      Thank you for this nice suggestion. While it was feasible to identify potent subspaces encoding reach direction and null spaces for target-velocity modulation, as suggested by the reviewer, the challenge remained that unsupervised methods were insufficient to isolate a pure target-velocity subspace from numerous possible candidates due to the small variance of target-velocity information. Although dPCA components can be used to construct orthogonal subspaces for individual task variables, we found that the targetvelocity information remained highly entangled with reach-direction representation. More details can be found in Figure S8C and its caption as below:

      “We used dPCA components with different features to construct three subspaces (same data in A, reach-direction space #3, #4, #5; target-velocity space #10, #15, #17; interaction space #6, #11, #12), and we projected trial-averaged data into these orthogonal subspaces using different colormaps. This approach allowed us to obtain a “potent subspace” coding reach direction and a “null space” for target velocity. The results showed that the reach-direction subspace effectively represented the reach direction. However, while the target-velocity subspace encoded the target velocity information, it still contained reach-direction clusters within each target-velocity condition, corroborating the results of the addition model in the main text (Figure 4). The interaction subspace revealed that multiple reach-direction rings were nested within each other, similar to the findings from the gain model (Figure 3 & 4). The interaction subspace also captured more variance than target-velocity subspace, consistent with our PCA results, suggesting the target-velocity modulation primarily coexists with reach-direction coding. Furthermore, we explored alternative methods to verify whether orthogonal subspaces could effectively separate the reach direction and target velocity. We could easily identify the reach-direction subspace, but its orthogonal subspace was relatively large, and the target-velocity information exhibited only small variance, making it difficult to isolate a subspace that purely encodes target velocity.”

      (3) RNN perturbations: as it's currently written, the RNN modeling has promise, but the perturbations performed don't provide me with much insight. I think this is because the authors are trying to use the RNN to interpret the single neuron tuning, but it's unclear to me what was learned from perturbing the connectivity between what seems to me almost arbitrary groups of neurons (especially considering that 43% of nodes were unclassifiable). It seems to me that a better perturbation might be to move the neural state before the movement onset to see how it changes the output. For example, the authors could move the neural state from one tilted ring to another to see if the virtual hand then reaches a completely different (yet predictable) target. Moreover, if the authors can more clearly characterize the preparatory movement, perhaps perturbations in the delay period would provide even more insight into how the interception might be prepared.

      We are sorry that we did not clarify the definition of “none” type, which can be misleading. The 43% unclassifiable nodes include those inactive ones; when only activate (taskrelated) nodes included, the ratio of unclassifiable nodes would be much lower. We recomputed the ratios with only activated units and have updated Table 1. By perturbing the connectivity, we intended to explore the interaction between different modulations.

      Thank you for the great advice. We considered moving neural states from one ring to another without changing the directional cluster. However, we found that this perturbation design might not be fully developed: since the top two PCs are highly correlated with movement direction, such a move—similar to exchanging two states within the same cluster but under different target-motion conditions—would presumably not affect the behavior.

      Reviewer #3 (Public Review):

      Summary:

      This experimental study investigates the influence of sensory information on neural population activity in M1 during a delayed reaching task. In the experiment, monkeys are trained to perform a delayed interception reach task, in which the goal is to intercept a potentially moving target.

      This paradigm allows the authors to investigate how, given a fixed reach endpoint (which is assumed to correspond to a fixed motor output), the sensory information regarding the target motion is encoded in neural activity.

      At the level of single neurons, the authors found that target motion modulates the activity in three main ways: gain modulation (scaling of the neural activity depending on the target direction), shift (shift of the preferred direction of neurons tuned to reach direction), or addition (offset to the neural activity).

      At the level of the neural population, target motion information was largely encoded along the 3rd PC of the neural activity, leading to a tilt of the manifold along which reach direction was encoded that was proportional to the target speed. The tilt of the neural manifold was found to be largely driven by the variation of activity of the population of gain-modulated neurons.

      Finally, the authors studied the behaviour of an RNN trained to generate the correct hand velocity given the sensory input and reach direction. The RNN units were found to similarly exhibit mixed selectivity to the sensory information, and the geometry of the “ neural population” resembled that observed in the monkeys.

      Strengths:

      - The experiment is well set up to address the question of how sensory information that is directly relevant to the behaviour but does not lead to a direct change in behavioural output modulates motor cortical activity.

      - The finding that sensory information modulates the neural activity in M1 during motor preparation and execution is non trivial, given that this modulation of the activity must occur in the nullspace of the movement.

      - The paper gives a complete picture of the effect of the target motion on neural activity, by including analyses at the single neuron level as well as at the population level. Additionally, the authors link those two levels of representation by highlighting how gain modulation contributes to shaping the population representation.

      Thank you for your recognition.

      Weaknesses:

      - One of the main premises of the paper is the fact that the motor output for a given reach point is preserved across different target motions. However, as the authors briefly mention in the conclusion, they did not record muscle activity during the task, but only hand velocity, making it impossible to directly verify how preserved muscle patterns were across movements. While the authors highlight that they did not see any difference in their results when resampling the data to control for similar hand velocities across conditions, this seems like an important potential caveat of the paper whose implications should be discussed further or highlighted earlier in the paper.

      Thanks for the suggestion. We’ve highlighted the resampling results as an important control in the revised manuscript in Figure S11 and Lines 257-260 as:

      “To eliminate hand-speed effect, we resampled trials to construct a new dataset with similar distributions of hand speed in each target-motion condition and found similar orbital neural geometry. Moreover, the target-motion gain model provided a better explanation compared to the hand-speed gain model (Figure S11).”

      - The main takeaway of the RNN analysis is not fully clear. The authors find that an RNN trained given a sensory input representing a moving target displays modulation to target motion that resembles what is seen in real data. This is interesting, but the authors do not dissect why this representation arises, and how robust it is to various task design choices. For instance, it appears that the network should be able to solve the task using only the motion intention input, which contains the reach endpoint information. If the target motion input is not used for the task, it is not obvious why the RNN units would be modulated by this input (especially as this modulation must lie in the nullspace of the movement hand velocity if the velocity depends only on the reach endpoint). It would thus be important to see alternative models compared to true neural activity, in addition to the model currently included in the paper. Besides, for the model in the paper, it would therefore be interesting to study further how the details of the network setup (eg initial spectral radius of the connectivity, weight regularization, or using only the target position input) affect the modulation by the motion input, as well as the trained population geometry and the relative ratios of modulated cells after training.

      Great suggestions. In the revised manuscript, we’ve added the results of three alternative modes in Table S4 and Lines 355-365 as below:

      “We also tested three alternative network models: (1) only receives motor intention and a GO-signal; (2) only receives target location and a GO-signal; (3) initialized with sparse connection (sparsity=0.1); the unmentioned settings and training strategies were as the same as those for original models (Table S4; see Methods). The results showed that the three modulations could emerge in these models as well, but with obviously distinctive distributions. In (1), the ring-like structure became overlapped rings parallel to the PC1PC2 plane or barrel-like structure instead; in (2), the target-motion related tilting tendency of the neural states remained, but the projection of the neural states on the PC1-PC2 plane was distorted and the reach-direction clusters dispersed. These implies that both motor intention and target location seem to be needed for the proposed ring-like structure. The initialization of connection weights of the hidden layer can influence the network’s performance and neural state structure, even so, the ring-like structure”

      - Additionally, it is unclear what insights are gained from the perturbations to the network connectivity the authors perform, as it is generally expected that modulating the connectivity will degrade task performance and the geometry of the responses. If the authors wish the make claims about the role of the subpopulations, it could be interesting to test whether similar connectivity patterns develop in networks that are not initialized with an all-to-all random connectivity or to use ablation experiments to investigate whether the presence of multiple types of modulations confers any sort of robustness to the network.

      Thank you for these great suggestions. By perturbations, we intended to explore the contribution of interaction between certain subpopulations. We’ve included the ablation experiments in the updated manuscript in Table S3 and Lines 344-346 as below: “The ablation experiments showed that losing any kind of modulation nodes would largely deteriorate the performance, and those nodes merely with PD-shift modulation could mostly impact the neural state structure (Table S3).”

      - The results suggest that the observed changes in motor cortical activity with target velocity result from M1 activity receiving an input that encodes the velocity information. This also appears to be the assumption in the RNN model. However, even though the input shown to the animal during preparation is indeed a continuously moving target, it appears that the only relevant quantity to the actual movement is the final endpoint of the reach. While this would have to be a function of the target velocity, one could imagine that the computation of where the monkeys should reach might be performed upstream of the motor cortex, in which case the actual target velocity would become irrelevant to the final motor output. This makes the results of the paper very interesting, but it would be nice if the authors could discuss further when one might expect to see modulation by sensory information that does not directly affect motor output in M1, and where those inputs may come from. It may also be interesting to discuss how the findings relate to previous work that has found behaviourally irrelevant information is being filtered out from M1 (for instance, Russo et al, Neuron 2020 found that in monkeys performing a cycling task, context can be decoded from SMA but not from M1, and Wang et al, Nature Communications 2019 found that perceptual information could not be decoded from PMd)?

      How and where sensory information modulating M1 are very interesting and open questions. In the revised manuscript, we discuss these in Lines 435-446, as below: “It would be interesting to explore whether other motor areas also allow sensory modulation during flexible interception. The functional differences between M1 and other areas lead to uncertain speculations. Although M1 has pre-movement activity, it is more related to task variables and motor outputs. Recently, a cycling task sets a good example that the supplementary motor area (SMA) encodes context information and the entire movement (Russo et al., 2020), while M1 preferably relates to cycling velocity (Saxena et al., 2022). The dorsal premotor area (PMd) has been reported to capture potential action selection and task probability, while M1 not (Cisek and Kalaska, 2005; Glaser et al., 2018; Wang et al., 2019). If the neural dynamics of other frontal motor areas are revealed, we might be able to tell whether the orbital neural geometry of mixed selectivity is unique in M1, or it is just inherited from upstream areas like PMd. Either outcome would provide us some insights into understanding the interaction between M1 and other frontal motor areas in motor planning.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      At times the writing was a little hard to parse. It could benefit from being fleshed out a bit to link sentences together better.

      There are a few grammatical errors, such as:

      "These results support strong and similar roles of gain and additive nodes, but what is even more important is that the three modulations interact each other, so the PD-shift nodes should not be neglected."

      should be

      "These results support strong and similar roles of gain and additive nodes, but what is even more important is that the three modulations interact WITH each other, so the PDshift nodes should not be neglected."

      The discussion could also be more extensive to benefit non-experts in the field.

      Thank you. We have proofread and polished the updated manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Other comments:

      - The authors mention mixed selectivity a few times, but Table 1 doesn't have a column for mixed selective neurons--this seems like an important oversight. Likewise, it would be good to see an example of a "mixed" neuron.

      - The structure of the writing in the results section often talked about the supplementary results before the main results - this seems backwards. If the supplementary results are important enough to come before the main figures, then they should not be supplementary. Otherwise, if the results are truly supplementary, they should come after the main results are discussed.

      - Line 305: Authors say "most" RNN units could be classified, and this is technically true, but only barely, according to Table 1. It might be good to put the actual percentage here in the text.

      - Figure 5a: typo ("Motion intention" rather than "Motor")

      - I couldn't find any mention of code or data availability in the manuscript.

      - There were a number of lines that didn't make much sense to me and should probably be rewritten or expanded on:

      - Lines 167-168: "These results qualitatively imply the interaction as that target speeds..." - Lines 178-179: "However, these neural trajectories were not yet the ideal description, because they were shaped mostly by time."

      - Lines 187-188: "...suggesting that target motion affects M1 neural dynamics via a topologically invariant transformation."

      - Lines 224-226: "Note that here we performed an linear transformation on all resulting neural state points to make the ellipse of the static condition orthogonal to the z-axis for better visualization." Does this mean that the z-axis is not PC 3 anymore?

      - Lines 272-274: "These simulations suggest that the existence of PD-shift and additive modulation would not disrupt the neural geometry that is primarily driven by gain modulation; rather it is possible that these three modulations support each other in a mixed population."

      Thank you for these detailed suggestions. By “mixed selectivity”, we mean the joint tuning of both target-motion and movement. In this case, the target-motion modulated neurons (regardless of the modulation type) are of mixed selectivity. The term “motor intention” refers to Mazzoni et al., 1996, Journal of Neurophysiology. We also revised the manuscript for better readership.

      We have updated the data and code availability in Data availability as below:

      “The example experimental datasets and relevant analysis code have been deposited in Mendeley Data at https://data.mendeley.com/datasets/8gngr6tphf. The RNN relevant code and example model datasets are available at https://github.com/yunchenyc/RNN_ringlike_structure.“

      Reviewer #3 (Recommendations For The Authors):

      Minor typos:

      Line 153: “there were”

      Line 301: “network was trained to generate”

      Line 318: “interact with each other”

      Suggested reformulations :

      Line 310 : “tilting angles followed a pattern similar to that seen in the data” Line 187 : the claim of a “topologically invariant transformation” seems strong as the analysis is quite qualitative.

      Suggested changes to the paper (aside from those mentioned in the main review): It could be nice to show behaviour in a main figure panel early on in the paper. This could help with the task description (as it would directly show how the trials are separated based on endpoint) and could allow for discussing the potential caveats of the assumption that behaviour is preserved.

      Thank you. We have corrected these typos and writing problems. As the similar task design has been reported, we finally decided not to provide extra figures or videos. Still, we thank this nice suggestion.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript by Thronlow Lamson et al., the authors develop a "beads-on-a-string" or BOAS strategy to link diverse hemagglutinin head domains, to elicit broadly protective antibody responses. The authors are able to generate varying formulations and lengths of the BOAS and immunization of mice shows induction of antibodies against a broad range of influenza subtypes. However, several major concerns are raised, including the stability of the BOAS, that only 3 mice were used for most immunization experiments, and that important controls and analyses related to how the BOAS alone, and not the inclusion of diverse heads, impacts humoral immunity.

      Strengths:

      Vaccine strategy is new and exciting.

      Analyses were performed to support conclusions and improve paper quality.

      Weaknesses:

      Controls for how different hemagglutinin heads impact immunity versus the multivalency of the BOAS.

      Only 3 mice were used for most experiments.

      There were limited details on size exclusion data.

      We appreciate the reviewer’s comments and have made the following changes to the manuscript.

      (1) We recognize that deconvoluting the effect of including a diverse set of HA heads and multivalency in the BOAS immunogens is necessary to understand the impact on antigenicity. Therefore, we now include a cocktail of the identical eight HA heads used in the 8-mer and BOAS nanoparticle (NP) as an additional control group. While we observed similar HA binding titers relative to the 8-mer and BOAS NP groups, the cocktail group-elicited sera was unable to neutralize any of the viruses tested; multivalency thus appears to be important for eliciting neutralizing responses

      (2) We increased the sample size by repeated immunizations with n=5 mice, for a total of n=8 mice across two independent experiments.

      (3) We expanded the details on size exclusion data to include:

      a) extended chromatograms from Figure 2C as Supplemental Figure 3.

      b) additional details in the materials and methods section (lines 370-372):

      “Recovered proteins were then purified on a Superdex 200 (S200) Increase 10/300 GL (for trimeric HAs) or Superose 6 Increase 10/300 GL (for BOAS) size-exclusion column in Dulbecco’s Phosphate Buffered Saline (DPBS) within 48 hours of cobalt resin elution.”

      Reviewer #2 (Public Review):

      Summary:

      The authors describe a "beads-on-a-string" (BOAS) immunogen, where they link, using a non-flexible glycine linker, up to eight distinct hemagglutinin (HA) head domains from circulating and non-circulating influenzas and assess their immunogenicity. They also display some of their immunogens on ferritin NP and compare the immunogenicity. They conclude that this new platform can be useful to elicit robust immune responses to multiple influenza subtypes using one immunogen and that it can also be used for other viral proteins.

      Strengths:

      The paper is clearly written. While the use of flexible linkers has been used many times, this particular approach (linking different HA subtypes in the same construct resembling adding beads on a string, as the authors describe their display platform) is novel and could be of interest.

      Weaknesses:

      The authors did not compare to individuals HA ionized as cocktails and did not compare to other mosaic NP published earlier. It is thus difficult to assess how their BOAS compare.<br /> Other weaknesses include the rationale as to why these subtypes were chosen and also an explanation of why there are different sizes of the HA1 construct (apart from expression). Have the authors tried other lengths? Have they expressed all of them as FL HA1?

      We appreciate the reviewer’s comments. We responded to the concerns below and modified the manuscript accordingly.

      (1) We recognize that including a “cocktail” control is important to understand how the multivalency present in a single immunogen affects the immune response. We now include an additional control group comprised of a mixture of the same eight HA heads used in the 8-mer and the BOAS nanoparticle (NP). While this cocktail elicited similar HA binding titers relative to the 8-mer and BOAS NP immunogens (Fig. 6G), there was no detectable neutralization any of the viruses tested (Fig. 7).

      (2) In the introduction we reference other multivalent display platforms but acknowledge that distinct differences in their immunogen design platforms make direct comparisons to ours difficult—which is ultimately why we did not use them as comparators for our in vivo studies. Perhaps most directly relevant to our BOAS platform is the mosaic HA NP from Kanekiyo et al. (PMID 30742080). Here, HA heads, with similar boundaries to ours, were selected from historical H1N1 strains. These NPs however were significantly less antigenic diverse relative to our BOAS NPs as they did not include any group 2 (e.g., H7, H9) or B influenza HAs; restricting their multivalent display to group 1 H1N1s likely was an important factor in how they were able to achieve broad, neutralizing H1N1 responses. Additionally, Cohen et al. (PMID 33661993) used similarly antigenically distinct HAs in their mosaic NP, though these included full-length HAs with the conserved stem region, which likely has a significant impact on the elicited cross-reactive responses observed. Lastly, we reference Hills et al. (PMID 38710880), where authors designed similar NPs with four tandemly-linked betacoronoavirus receptor binding domains (RBDs) to make “quartets”. In contrast to our observations, the authors observed increased binding and neutralization titers following conjugation to protein-based NPs. We acknowledge potential differences between the studies, such as the antigen and larger VLP NP, that could lead to the different observed outcomes.

      (3) We intended to highlight the “plug-and-play” nature of the BOAS platform; theoretically any HA subtype could be interchanged into the BOAS. To that end, our rationale for selecting the HA subtypes in our proof-of-principle immunogen was to include an antigenically diverse set of circulating and non-circulating HAs that we could ultimately characterize with previously published subtype-specific antibodies that were also conformation-specific. In doing so, these diagnostic antibodies could confirm presence and conformation integrity of each component. We intentionally did not include HA subtypes that we did not have a conformation-specific antibody for.

      The different sizes of HA head domains was determined exclusively by expression of the recombinant protein. We have not attempted expression of full-length HA1 domains. Furthermore, we have not attempted to express the full-length HA (inclusive of HA1 and HA2) in our BOAS platform. The primary reason was to avoid including the conserved stem region of HA2 which may distract from the HA1 epitopes (e.g., receptor binding site, lateral patch) that can be engaged by broadly neutralizing antibodies. Additionally, the full-length HA is inherently trimeric and may not be as amenable to our BOAS platform as the monomeric HA1 head domain.

      Reviewer #3 (Public Review):

      This work describes the tandem linkage of influenza hemagglutinin (HA) receptor binding domains of diverse subtypes to create 'beads on a string' (BOAS) immunogens. They show that these immunogens elicit ELISA binding titers against full-length HA trimers in mice, as well as varying degrees of vaccine mismatched responses and neutralization titers. They also compare these to BOAS conjugated on ferritin nanoparticles and find that this did not largely improve immune responses. This work offers a new type of vaccine platform for influenza vaccines, and this could be useful for further studies on the effects of conformation and immunodominance on the resulting immune response.

      Overall, the central claims of immunogenicity in a murine model of the BOAS immunogens described here are supported by the data.

      Strengths included the adaptability of the approach to include several, diverse subtypes of HAs. The determination of the optimal composition of strains in the 5-BOAS that overall yielded the best immune responses was an interesting finding and one that could also be adapted to other vaccine platforms. Lastly, as the authors discuss, the ease of translation to an mRNA vaccine is indeed a strength of this platform.

      One interesting and counter-intuitive result is the high levels of neutralization titers seen in vaccine-mismatched, group 2 H7 in the 5-BOAS group that differs from the 4-BOAS with the addition of a group 1 H5 RBD. At the same time, no H5 neutralization titers were observed for any of the BOAS immunogens, yet they were seen for the BOAS-NP. Uncovering where these immune responses are being directed and why these discrepancies are being observed would constitute informative future work.

      There are a few caveats in the data that should be noted:

      (1) 20 ug is a pretty high dose for a mouse and the majority of the serology presented is after 3 doses at 20 ug. By comparison, 0.5-5 ug is a more typical range (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6380945/, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9980174/). Also, the authors state that 20 ug per immunogen was used, including for the BOAS-NP group, which would mean that the BOAS-NP group was given a lower gram dose of HA RBD relative to the BOAS groups.

      We agree that this is on the “upper end” of recombinant protein dose. While we did not do a dose-response, we now include serum analyses after a single prime. The overall trends and reactivity to matched and mis-matched BOAS components remained similar across days d28 and d42. However, the differences between the BOAS and BOAS NP groups and the mixture group were more pronounced at d28, which reinforces our observation that the multivalency of the HA heads is necessary for eliciting robust serum responses to each component. These data are included in Supplemental Figure 5, and we’ve modified the text (lines 185-187) to include;

      “Similar binding trends were also observed with d28 serum, though the difference between the 8mer and mix groups was more pronounced at d28 (Supplemental Figure 5).”

      Additionally, we acknowledge that there is a size discrepancy between the BOAS NP and the largest BOAS, leading to an approximately ~15-fold difference on a per mole basis of the BOAS immunogen. The smallest and largest BOAS also differ by ~ 2.5-fold on a per mole basis; this could favor the overall amount of the smaller immunogens, however because vaccine doses are typically calculated on a mg per kg basis, we did not calculate on a molar basis for this study. Any promising immunogens will be evaluated in dose-response study to optimize elicited responses.

      (2) Serum was pooled from all animals per group for neutralization assays, instead of testing individual animals. This could mean that a single animal with higher immune responses than the rest in the group could dominate the signal and potentially skew the interpretation of this data.

      We repeated the neutralization assays with data points for individual mice. There does appear to be variability in the immune response between mice. This is most noticeable for responses to the H5 component. We are currently assessing what properties of our BOAS immunogen might contribute to the variability across individual mice.

      (3) In Figure S2, it looks like an apparent increase in MW by changing the order of strains here, which may be due to differences in glycosylation. Further analysis would be needed to determine if there are discrepancies in glycosylation amongst the BOAS immunogens and how those differ from native HAs.

      There does appear to be a relatively small difference in MW between the two BOAS configurations shown in Figure S2. This could be due to differences in glycosylation, as the reviewer points out, and in future studies, we intend to assess the influence of native glycosylation on antibody responses elicited by our BOAS immunogens.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major Concerns

      (1) From Figure 2D-E, it looks like BOAS are forming clusters, rather than a straight line. Do these form aggregates over time? Both at 4 degrees over a few days or after freeze-thaw cycle(s)? It is unclear from the SEC methods how long after purification this was performed and stability should be considered.

      Due to the inherent flexibility of the Gly-Ser linker between each component we do not anticipate that any rigidity would be imposed resulting in a “straight line”. Nevertheless, we appreciate the reviewers concern about the long-term stability of the BOAS immunogens. To address this, we include 1) the extended chromatograms from Figure 2C as Supplemental Figure 3 to show any aggregates present, 2) traces from up to 48 hours post-IMAC, and 3) chromatograms following a freeze-thaw cycle. Post-IMAC purification there is a minor (<10% total peak height) at ~9mL corresponding to aggregation. Note, we excluded this aggregation for immunizations. Post freeze-thaw cycle, we can see that upon immediate (<24hrs) thawing, the BOAS maintain a homogeneous peak with no significant (<10%) aggregation or degradation peak. However, after ~1 week post-freeze-thaw cycle at 4C, additional peaks within the chromatogram correspond to degradation of the BOAS.

      We modified the materials and methods section to state (lines 370-372)

      “Recovered proteins were then purified on a Superdex 200 (S200) Increase 10/300 GL (for trimeric HAs) or Superose 6 Increase 10/300 GL (for BOAS) size-exclusion column in Dulbecco’s Phosphate Buffered Saline (DPBS) within 48 hours of cobalt resin elution.”

      We commented on BOAS stability in the results section (lines 142-148)

      “Following SEC, affinity tags were removed with HRV-3C protease; cleaved tags, uncleaved BOAS, and His-tagged enzyme were removed using cobalt affinity resin and snap frozen in liquid nitrogen before immunizations. BOAS maintained monodispersity upon thawing, though over time, degradation was observed following longer term (>1 week) storage at 4C (Supplemental Figure 3). This degradation became more significant as BOAS increased in length (Supplemental Figure 3).”

      We also included in the discussion (lines 277-279):

      “Notably, for longer BOAS we observed degradation following longer term storage at 4C, which may reflect their overall stability.”

      (2) Figures 3-4 and 6-7, to make conclusions off of 3 mice per group is inappropriate. A sample size calculation should have been conducted and the appropriate number of mice tested. In addition, two independent mouse experiments should always be performed. Moreover, the reliability of the statistical tests performed seems unlikely, given the very small sample size.

      We agree that additional mice are necessary to make assessments regarding immunogenicity and cross-reactivity differences between the immunogens. To address this, we repeated the immunization with 5 additional mice, for a total of n=8 mice over two independent experiments. We incorporated these data into Figure 3B-D, as well as an additional Figure 3E (see below). We also now report the log-transformed endpoint titer (EPT) values rather than reciprocal EC50 values and added clarity to statistical analyses used. We have added the following lines to the methods section

      lines 427-431:

      “Serum endpoint titer (EPT) were determined using a non-linear regression (sigmoidal, four-parameter logistic (4PL) equation, where x is concentration) to determine the dilution at which dilution the blank-subtracted 450nm absorbance value intersect a 0.1 threshold. Serum titers for individual mice against respective antigens are reported as log transformed values of the EPT dilution.”

      lines 406-408:

      “C57BL/6 mice (Jackson Laboratory) (n=8 per group for 3-, 4-, 5-, 6-, 7-, and 8mer cohorts; n=5 for BOAS NP, NP, and mix cohorts) were immunized with 20µg of BOAS immunogens of varying length and adjuvanted with 50% Sigmas Adjuvant for a total of 100µL of inoculum.”

      lines 482-490:

      “Statistical Analysis

      Significance for ELISAs and microneutralization assays were determined using Prism (GraphPad Prism v10.2.3). ELISAs comparing serum reactivity and microneutralization and comparing >2 samples were analyzed using a Kruskal-Wallis test with Dunn’s post-hoc test to correct for multiple comparisons. Multiple comparisons were made between each possible combination or relative to a control group, where indicated. ELISAs comparing two samples were analyzed using a Mann-Whitney test. Significance was assigned with the following: * = p<0.05, ** = p<0.01, *** = p<0.001, and **** = p<0.0001. Where conditions are compared and no significance is reported, the difference was non-significant.”

      (3) One critical control that is missing is a homogenous BOAS, for example, just linking one H1 on a BOAS. Does oligomerization and increasing avidity alone improve humoral immunity?

      We agree that this is an interesting point, However, to address the impact of oligomerization and avidity on humoral immunity, we now include an additional control with a cocktail of HA heads used in the 8mer. We have incorporated this into Figure 3A, 3D and 3E, Figure 6G, and Figure 7.

      Additionally, we have added the following lines in the manuscript:

      lines 38-40:

      “Finally, vaccination with a mixture of the same HA head domains is not sufficient to elicit the same neutralization profile as the BOAS immunogens or nanoparticles.”

      lines 105-106:

      “Additionally, we showed that a mixture of the same HA head components was not sufficient to recapitulate the neutralizing responses elicited by the BOAS or BOAS NP.”

      lines 169-172:

      “To determine immunogenicity of each BOAS immunogen, we performed a prime-boost-boost vaccination regimen in C5BL/6 mice at two-week intervals with 20µg of immunogen and adjuvanted with Sigma Adjuvant (Figure 3A). We compared these BOAS to a control group immunized with a mixture of the eight HA heads present in the 8mer.”

      lines 265-267:

      “There were qualitatively immunodominant HAs, notably H4 and H9, and these were relatively consistent across BOAS in which they were a component. This effect was reduced in the mix cohort.”

      (4) While some cross-reactivity is likely (Figure 6G), there is considerable loss of binding when there is a mismatch. Of the antibodies induced, how much of this is strain-specific? For example, how well do serum antibodies bind to a pre-2009 H1?

      We agree with the reviewer that there is a considerable loss of binding when there is a mismatched HA component. To better understand this and incorporate a mismatched strain into our analysis of the 8mer and BOAS NP, we looked at serum binding titers to a pre-2009 H1, H1/Solomon Islands/2006, and an antigenically distinct H3, H3/Hong Kong/1968. We have incorporated this data into Figures 3D, 3E, 6F and 6G. We observed relatively high titers against both a mismatched H1 and H3, indicating that the BOAS maintain high titers against subtype-specific strains that are conserved over considerable antigenic distance. However, this was similar in the mixture group, indicating that this may not be specific to oligomerization of BOAS immunogens.

      We added the following to the methods section:

      lines 357-361

      “Head subdomains from these HAs were used in the BOAS immunogens, and full-length soluble ectodomain (FLsE) trimers were used in ELISAs. Additional H1 (H1/A/Solomon Islands/3/2006) and H3 (H3/A/Hong Kong/1/1968) FLsEs were used in ELISAs as mismatched, antigenically distinct HAs for all BOAS.”

      Minor Concerns

      (1) Line 44-46, the deaths per year are almost exclusively due to seasonal influenza outbreaks caused by antigenically drifted viruses in humans, not those spilling over from avian sp. and swine. For accuracy, please adjust this sentence.

      We have adjusted lines 45-48 to say “This is largely a consequence of viral evolution and antigenic drift as it circulates seasonally within humans and ultimately impacts vaccine effectiveness. Additionally, the chance for spillover events from animal reservoirs (e.g., avian, swine) is increasing as population and connectivity also increase.”

      (2) Figure 4D-E, provide a legend for what the symbols indicate, or simply just put the symbol next to either the homology score and % serum competition labels on the y-axis.

      We have included a legend in Figures 4D,E to distinguish between homology score and % serum competition

      (3) I am a bit confused by the data presented in Figure 7. The figure legend says the two symbols represent technical replicates. How? Is one technical replicate of all the mice in a group averaged and that's what's graphed? If so, this is not standard practice. I would encourage the authors to show the average technical replicates of each animal, which is standard.

      We thank the reviewer for their suggestion, and we have revised Figure 7 such that each symbol represents a single animal for n=5 animals. We have also adjusted the figure caption to the following:

      “Figure 7: Microneutralization titers to matched and mis-matched virus- Microneutralization of matched and mis-matched psuedoviruses: H1N1 (green, top left), H3N2 (orange, top right), H5N1 (yellow, bottom left), and H7N9 viruses (pink, bottom right) with d42 serum. Solid bars below each plot indicate a matched sub-type, and striped bars indicate a mis-matched subtype (i.e. not present in the BOAS). NP negative controls were used to determine threshold for neutralization. Upper and lower dashed lines represent the first dilution (1:32) (for H1N1, H3N2, and H5N1) or neutralization average with negative control NP serum (H7N9), and the last serum dilution (1:32,768), respectively, and points at the dashed lines indicate IC50s at or outside the limit of detection. Individual points indicate IC50 values from individual mice from each cohort (n=5). The mean is denoted by a bar and error bars are +/- 1 s.d., * = p<0.05 as determined by a Kruskal-Wallis test with Dunn’s multiple comparison post hoc test relative to the mix group.”

      (4) Paragraphs 298-313, multiple studies are referred to but not referenced.

      We have added the following references to this section:

      (38) Kanekiyo, M. et al. Self-assembling influenza nanoparticle vaccines elicit broadly neutralizing H1N1 antibodies. Nature 498, 102–106 (2013).

      (48) Hills, R. A. et al. Proactive vaccination using multiviral Quartet Nanocages to elicit broad anti-coronavirus responses. Nat. Nanotechnol. 1–8 (2024) doi:10.1038/s41565-024-01655-9.

      (65) Jardine, J. et al. Rational HIV immunogen design to target specific germline B cell receptors. Science 340, 711–716 (2013).

      (66) Tokatlian, T. et al. Innate immune recognition of glycans targets HIV nanoparticle immunogens to germinal centers. Science 363, 649–654 (2019).

      (67) Kato, Y. et al. Multifaceted Effects of Antigen Valency on B Cell Response Composition and Differentiation In Vivo. Immunity 53, 548-563.e8 (2020).

      (68) Marcandalli, J. et al. Induction of Potent Neutralizing Antibody Responses by a Designed Protein Nanoparticle Vaccine for Respiratory Syncytial Virus. Cell 176, 1420-1431.e17 (2019).

      (69) Bruun, T. U. J., Andersson, A.-M. C., Draper, S. J. & Howarth, M. Engineering a Rugged Nanoscaffold To Enhance Plug-and-Display Vaccination. ACS Nano 12, 8855–8866 (2018).

      (70) Kraft, J. C. et al. Antigen- and scaffold-specific antibody responses to protein nanoparticle immunogens. Cell Reports Medicine 100780 (2022) doi:10.1016/j.xcrm.2022.100780.

      Reviewer #2 (Recommendations For The Authors):

      Can the authors define "detectable titers"?

      Maybe add a threshold value of reciprocal EC on the figure for each plot.

      We recognize the reviewers concern with reporting serum titers in this way, and we have adjusted our reported titers as endpoint titers (EPT) with a dotted line for the first detectable dilution (1:50). We have also adjusted the methods section to reflect this change:

      (lines 427-431)

      “Serum endpoint titer (EPT) were determined using a non-linear regression (sigmoidal, four-parameter logistic (4PL) equation, where x is concentration) to determine the dilution at which dilution the blank-subtracted 450nm absorbance value intersect a 0.1 threshold. Serum titers for individual mice against respective antigens are reported as log transformed values of the EPT dilution.”

      It also appears that not all X-mer elicits an immune response against matched HA, e.g. for the 7 and 8 -mer. Not sure why the authors do not mention this. It could be due to too many HAs, not sure.

      We apologize for the confusion, and agree that our original method of reporting EC50 values does not reflect weak but present binding titers. Upon further analysis with additional mice as well as adjusting our method of reporting titers, it is easier to see in Figure 3D that all X-mer BOAS do indeed elicit binding detectable titers to matched HA components.

      It will be nice to add a conclusion to the cross-reactivity - again it appears that past 6-mer there has been a loss in cross-reactivity even though there are more subtypes on the BOAS.

      Also, the TI seemed to be the more conserved epitope targeted here.

      (Of note these two are mentioned in the discussion)

      We have updated the results section to include the following:

      (lines 281-294)

      “Based on the immunogenicity of the various BOAS and their ability to elicit neutralizing responses, it may not be necessary to maximize the number of HA heads into a single immunogen. Indeed, it qualitatively appears that the intermediate 4-, 5-, and 6mer BOAS were the most immunogenic and this length may be sufficient to effectively engage and crosslink BCR for potent stimulation. These BOAS also had similar or improved binding cross-reactivity to mis-matched HAs as compared to longer 7- or 8mer BOAS. Notably, the 3mer BOAS elicited detectable cross-reactive binding titers to H4 and H5 mismatched HAs in all mice. This observed cross-reactivity could be due to sequence conservation between the HAs, as H3 and H4 share ~51% sequence identity, and H1 and H2 share ~46% and ~62% overall sequence identity with H5, respectively (Supplemental Figure 6). Additionally, the degree of surface conservation decreased considerably beyond the 5mer as more antigenically distinct HAs were added to the BOAS. These data suggest that both antigenic distance between HA components and BOAS length play a key role in eliciting cross-reactive antibody responses, and further studies are necessary to optimize BOAS valency and antigenic distance for a desired response.”

      Figure 5E, the authors could indicate which subtype each mab is specific to for those who are not HA experts. (They have them color-coded but it is hard to see because very small).

      The authors also do not explain why 3E5 does not bind well to H1, H2, H3, H4 4-mer BOA, etc...

      We apologize for the lack of clarity in this figure. We updated Figure 5E to include the subtype it is specific for as well as listing the antibodies and their subtype and targeted epitope in the figure caption.

      Minor

      Figure 1B zoom looks like the line is hidden to the structure - should come in front

      We adjusted the figure accordingly.

      Line 127 - whether the order

      Corrected

      What is the rationale for thinking that a different order will lead to a different expression and antigenic results?

      We thank the reviewer for this question. We did not necessarily anticipate a difference in protein expression based on BOAS order We, however, wanted to verify that our platform was indeed “plug-and-play” platform and we could readily exchange components and order. We do, however, hypothesize that a different order may in fact lead to different antigenic results. We think that the conformation of the BOAS as well as physical and antigenic distance of HA components may influence cross-linking efficiency of BCRs and lead to different antigenic results with different levels of cross-reactivity. For example, a BOAS design with a cluster of group 1 HAs followed by a cluster of group 2 HAs, rather than our roughly alternating pattern could impact which HAs are in proximity to each other or could be potentially shielded in certain conformations, and thus could affect antigenic results. We expand on this rationale in the discussion in lines 310-314:

      “Further studies with different combinations of HAs could aid in understanding how length and composition influences epitope focusing. For example, a BOAS design with a cluster of group 1 HAs followed by a cluster of group 2 HAs, rather than our roughly alternating pattern could impact which HAs are in close proximity to one other or could be potentially shielded in certain conformations, and thus could affect antigenic results.”

      Maybe list HA#1 HA#2 HA#3 instead of HA1, HA2, HA3 to make sure it is not confounded with HA2 and HA2

      We agree that this may be confusing for readers, and have adjusted Figure 1C to show HA#1, HA#2, etc.

      For nsEM, do the authors have 2D classes and even 3D reconstructions? Line 148-149: maybe or just because there are more HAs.

      We did not obtain 2D class or 3D reconstructions of these BOAS. However, we do agree with the reviewer that the collapsed/rosette structure of the 8mer BOAS may be a consequence of the additional HA heads as well as the flexible Gly-Ser linkers between the components. We have added clarify to our statement in the discussion to read:

      lines 154-156:

      “This is likely a consequence of the flexible GSS linker separating the individual HA head components as well as the addition of significantly more HA head components to the construct.”.

      Line 153 " interface-directed" - what does this mean?

      We apologize for any confusion- we intend for “interface-directed” to refer antibodies that engage the trimer interface (TI) epitope between HA protomers. We have adjusted the manuscript to use the same terminology throughout, i.e. trimer interface or its abbreviation, TI.

      For Figure 2 F - do you have a negative control? Usually one does not determine an ELISA KD, it is not very accurate but shows binding in terms of OD value.

      We did include a negative control, MEDI8852, a stem-directed antibody, though it was not shown in the figure because we observed no binding, as expected. This negative control antibody was also used in Figure 5E for characterizing the BOAS NPs, and also shows no binding. We recognize that in an ELISA the KD is an equilibrium measurement and we do not report kinetic measurements as determined by a method such as bio-layer interferometry (BLI), and have this adjusted the figure caption to denote the values as “apparent K<sub>D</sub> values”.

      Line 169 - reads strangely, "BOAS-elicited serum, regardless of its length, reacted<br /> The length is the one of the Immunogen, not the serum

      We agree that this statement is unclear, and we have modified the sentence to read:

      lines 177-178:

      “Each of the BOAS, regardless of its length, elicited binding titers to all matched full-length HAs representing individual components (Figure 3D).”

      What is the adjuvant used (add in results)?

      We used Sigma adjuvant for all immunizations, and have included this information in the results section:

      lines 169-171:

      “To determine immunogenicity of each BOAS, we performed a prime-boost-boost vaccination regimen in C5BL/6 mice at two-week intervals with 20µg of immunogen and adjuvanted with Sigma Adjuvant (Figure 3A).”

      This information is also included in the methods section in lines 406-412.

      Line 178 - remove " across"

      We have removed the word “across” in this sentence and replaced it with “on” (line 194)

      Trimer- interface, and interface epitopes are used exchangeably - maybe keep it as trimer interface to be more precise

      As stated above, we have adjusted the manuscript to use the same term throughout, i.e., trimer interface or its abbreviation, TI.

      Line 221 - no figure 6H (6G?)

      We apologize for this typo and have corrected to Figure 6G (line 231)

      Reviewer #3 (Recommendations For The Authors):

      (1) Since 20 ug x3 doses is quite a high amount of vaccine, differences between immunogens may become blurred. Thus, it may be informative to compare post-prime serology for all immunogens or select immunogens to compare to the post-3rd dose data.

      We agree with the reviewer that this is on the upper end of vaccine dose and thus we explored the serum responses after a single boost. The overall trends and reactivity to matched and mis-matched BOAS components remained similar across days d28 and d42. However, the differences between the BOAS and BOAS NP groups and the mixture group were more pronounced at d28, which bolsters our claim that the presentation of the HA heads is important for eliciting strong serum responses to all components. We have included this data in Supplemental Figure 5, and have acknowledged this in the text:

      lines 185-187:

      “Similar binding trends were also observed with d28 serum, though the difference between the 8mer and mix groups was more pronounced at d28 (Supplemental Figure 5).”

      (2) Significance statistics for all immunogenicity data should be added and discussed; it is particularly absent in Figures 3D and 7.

      We have added statistical analyses to Figure 3 and Figure 7 to reflect changes in immunogenicity. We have also added the following to the methods section:

      lines 482-490:

      “Statistical Analysis

      Significance for ELISAs and microneutralization assays were determined using either a Mann-Whitney test or a Kruskal-Wallis test with Dunn’s post-hoc test in Prism (GraphPad Prism v10.2.3) to correct for multiple comparisons. Multiple comparisons were made between each possible combination or relative to a control group, where indicated. Significance was assigned with the following: * = p<0.05, ** = p<0.01, *** = p<0.001, and **** = p<0.0001. Where conditions are compared and no significance is reported, the difference was non-significant.”

      (3) Figure 2F: the figure has K03.12 listed for the H3-specific mAb and in the main text, but the caption says 3E5 - is the 3E5 in the caption a typo? 3E5 is listed for the competition ELISAs as an RBS mAb, but its binding site is distal to the RBS at residues 165-170 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9787348/), H7.167 binds in the RBS periphery and not directly within the RBS, and the epitope for P2-D9 is undetermined/not presented. This could mean that there is actually a higher proportion of RBS-directed antibodies than what is determined from this serum competition data. Also, reference to these as 'RBS-directed' in the serum competition methods section should be revised for accuracy.

      We sincerely apologize for this error and the resulting confusion. 3E5 in the caption is incorrect and should be K03.12 (https://www.rcsb.org/structure/5W08) and does engage the receptor binding site. We also apologize for the oversight that H7.167 is in the RBS periphery and not directly in the RBS. The additional P2-D9 in the panel of RBS-directed antibodies was also in error, as we do not believe it is RBS-directed, but is indeed H4 specific. We also included a reference to the paper and immunogen that elicited this antibody. We agree that this indicates that there could be a higher proportion of RBS-directed antibodies in the serum and have modified the text in the results and methods sections to read:

      lines 300-306:

      “Notably, this proportion is approximate, as at the time of reporting, antibodies that bind the receptor binding site of all components were not available. RBS-directed antibodies to the H4 and H9 component were not available, and the RBS-directed antibodies used targeting the other HA components have different footprints around the periphery of the RBS. Additionally, there are currently no reported influenza B TI-directed antibodies in the literature. Therefore, this may be an underestimate of the serum proportion focused to the conserved RBS and TI epitopes.”

      lines 435-439:

      “Following blocking with BSA in PBS-T, blocking solution was discarded and 40µL of either DPBS (no competition control), a cocktail of humanized antibodies targeting the RBS and periphery (5J8, 2G1, K03.12, H5.3, H7.167, H1209), a cocktail of humanized TI-directed antibodies (S5V2-29, D1 H1-17/H3-14, D2 H1-1/H3-1), or a negative control antibody (MEDI8852) were added at a concentration of 100µg/mL per antibody.”

      (4) Only nsEM data is shown for the 3-BOAS and 8-BOAS, where differences in morphology were seen between these longer and shorter proteins. Including nsEM images for all BOAS immunogens may show trends in morphology or organization that could correlate with immune responses, e.g. if the 5-BOAS also forms a higher proportion of rosette-like structures, while the the 4-BOAS is still a mix between extended and rosette-like, this could be a factor in the better immune responses seen for 5-BOAS.

      We appreciate the reviewer’s suggestion for further analysis of morphology between the intermediate BOAS sizes. We agree that the relationship between BOAS length and morphology should be explored more in depth, and we intend to do so in future studies and to also vary linker length and rigidity.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      This was a clearly written manuscript that did an excellent job summarizing complex data.

      In this manuscript, Cuevas-Zuviría et al. use protein modeling to generate over 5,000 predicted structures of nitrogenase components, encompassing both extant and ancestral forms across different clades. The study highlights that key insertions define the various Nif groups. The authors also examined the structures of three ancestral nitrogenase variants that had been previously identified and experimentally tested. These ancestral forms were shown in earlier studies to exhibit reduced activity in Azotobacter vinelandii, a model diazotroph. This work provides a useful resource for studying nitrogenase evolution.

      However, its impact is somewhat limited due to a lack of evidence linking the observed structural differences to functional changes. For example, in the ancestral nitrogenase structures, only a small set of residues (lines 421-431) were identified as potentially affecting interactions between nitrogenase components. Why didn't the authors test whether reverting these residues to their extant counterparts could improve nitrogenase activity of the ancestral variants?

      We thank the reviewer for their thoughtful comments. We acknowledge that our current study is primarily focused on a computational exploration of the structural differences in both extant and ancestral nitrogenase variants, which allowed us to generate a comprehensive structural dataset. Although we did not carry out experimental reversion tests in this study, we agree that directly assessing the functional consequences of reverting the specific residues (lines 420 to 429) to their extant counterparts is an important next step to elucidate their functional role. Indeed, these findings provide a valuable foundation for our future work, which is designed to include experimental characterization of these variants and further elucidate the role of critical residues in nitrogenase activity and evolution. We believe that these experiments will offer the direct functional validation that the reviewer has rightly pointed out, and we look forward to reporting on these results in a future study.

      Additionally, the paper feels somewhat disconnected. The predicted nitrogenase structures discussed in the first half of the manuscript were not well integrated with the findings from the ancestral structures. For instance, do the ancestral nitrogenase structures align with the predicted models? This comparison was never explicitly made and could have strengthened the study's conclusions.

      We thank the reviewer for this suggestion. Our original analysis (previously shown in Figure S9, now Figure S10) included insights into structural align comparisons. In response, we have reorganized the results section (lines 351-355) to explicitly address this comparison.

      Reviewer #2 (Public review):

      This work aims to study the evolution of nitrogenases, understanding how their structure and function adapted to changes in the environment, including oxygen levels and changes in metal availability. The study predicts > 5000 structures of nitrogenases, corresponding to extant, ancestral, and alternative ancestral sequences. It is observed that structural variations in the nitrogenases correlate with phylogenetic relationships. The amount of data generated in this study represents a massive undertaking that is certain to be a resource for the community. The study also provides strong insight into how structural evolution correlates with environmental and biological phenotypes.

      The challenge with this study is that all (or nearly all) of the quantitative analyses presented are based on RMSD calculations, many of which are under 2 angstroms. For all intents and purposes, two structures with RMSD < 2 angstroms could be considered 'structurally identical'. A lot of insight generated is based on minuscule differences in RMSD, for which it is not clear that they are significantly different. The suggestion would be to find a way to evaluate the RMSD metric and determine whether these values, as obtained for structures being compared, are reliable. Some options are provided in earlier studies: PMID: 11514933, PMID: 17218333, PMID: 11420449, PMID: 8289285 (and others). It could also be valuable to focus more on site-specific RMSDs rather than Global RMSDs. The high conservation in the nitrogenases likely ensures that the global RMSDs will remain low across the family. Focusing on specific regions might reveal interesting differences between clades that are more informative regarding the evolution of structure in tandem with environment/time.

      We thank the reviewer for their suggestions. We agree that while global RMSD values below 2Å typically indicate high structural similarity, relying solely on these measures can mask subtle yet potentially functionally meaningful differences. Our aim was not to test for overall structural identity but rather to quantify fine-scale variations between highly conserved nitrogenase structures, including extant and ancestral variants. Nevertheless, in light of the reviewer’s suggestions, we have implemented an additional metric ( rmsd<sub>100</sub>) for a more nuanced comparison. The results of our additional analyses (Figure S3) align closely with our original results (Figure 2), supporting our decision to retain the un-normalized results in the main text. As an additional measure, we also computed site-specific RMSDs for the active site’s environments (Figure S6) to further delineate subtle structural variations.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      Examination of (a)periodic brain activity has gained particular interest in the last few years in the neuroscience fields relating to cognition, disorders, and brain states. Using large EEG/MEG datasets from younger and older adults, the current study provides compelling evidence that age-related differences in aperiodic EEG/MEG signals can be driven by cardiac rather than brain activity. Their findings have important implications for all future research that aims to assess aperiodic neural activity, suggesting control for the influence of cardiac signals is essential.

      We want to thank the editors for their assessment of our work and highlighting its importance for the understanding of aperiodic neural activity. Additionally, we want to thank the three present and four former reviewers (at a different journal) whose comments and ideas were critical in shaping this manuscript to its current form. We hope that this paper opens up many more questions that will guide us - as a field - to an improved understanding of how “cortical” and “cardiac” changes in aperiodic activity are linked and want to invite readers to engage with our work through eLife’s comment function.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The present study addresses whether physiological signals influence aperiodic brain activity with a focus on age-related changes. The authors report age effects on aperiodic cardiac activity derived from ECG in low and high-frequency ranges in roughly 2300 participants from four different sites. Slopes of the ECGs were associated with common heart variability measures, which, according to the authors, shows that ECG, even at higher frequencies, conveys meaningful information. Using temporal response functions on concurrent ECG and M/EEG time series, the authors demonstrate that cardiac activity is instantaneously reflected in neural recordings, even after applying ICA analysis to remove cardiac activity. This was more strongly the case for EEG than MEG data. Finally, spectral parameterization was done in large-scale resting-state MEG and ECG data in individuals between 18 and 88 years, and age effects were tested. A steepening of spectral slopes with age was observed particularly for ECG and, to a lesser extent, in cleaned MEG data in most frequency ranges and sensors investigated. The authors conclude that commonly observed age effects on neural aperiodic activity can mainly be explained by cardiac activity.

      Strengths:

      Compared to previous investigations, the authors demonstrate the effects of aging on the spectral slope in the currently largest MEG dataset with equal age distribution available. Their efforts of replicating observed effects in another large MEG dataset and considering potential confounding by ocular activity, head movements, or preprocessing methods are commendable and valuable to the community. This study also employs a wide range of fitting ranges and two commonly used algorithms for spectral parameterization of neural and cardiac activity, hence providing a comprehensive overview of the impact of methodological choices. Based on their findings, the authors give recommendations for the separation of physiological and neural sources of aperiodic activity.

      Weaknesses:

      While the aim of the study is well-motivated and analyses rigorously conducted, the overall structure of the manuscript, as it stands now, is partially misleading. Some of the described results are not well-embedded and lack discussion.

      We want to thank the reviewer for their comments focussed on improving the overall structure of the manuscript. We agree with their suggestions that some results could be more clearly contextualized and restructured the manuscript accordingly.

      Reviewer #2 (Public review):

      I previously reviewed this important and timely manuscript at a previous journal where, after two rounds of review, I recommended publication. Because eLife practices an open reviewing format, I will recapitulate some of my previous comments here, for the scientific record.

      In that previous review, I revealed my identity to help reassure the authors that I was doing my best to remain unbiased because I work in this area and some of the authors' results directly impact my prior research. I was genuinely excited to see the earlier preprint version of this paper when it first appeared. I get a lot of joy out of trying to - collectively, as a field - really understand the nature of our data, and I continue to commend the authors here for pushing at the sources of aperiodic activity!

      In their manuscript, Schmidt and colleagues provide a very compelling, convincing, thorough, and measured set of analyses. Previously I recommended that the push even further, and they added the current Figure 5 analysis of event-related changes in the ECG during working memory. In my opinion this result practically warrants a separate paper its own!

      The literature analysis is very clever, and expanded upon from any other prior version I've seen.

      In my previous review, the broadest, most high-level comment I wanted to make was that authors are correct. We (in my lab) have tried to be measured in our approach to talking about aperiodic analyses - including adopting measuring ECG when possible now - because there are so many sources of aperiodic activity: neural, ECG, respiration, skin conductance, muscle activity, electrode impedances, room noise, electronics noise, etc. The authors discuss this all very clearly, and I commend them on that. We, as a field, should move more toward a model where we can account for all of those sources of noise together. (This was less of an action item, and more of an inclusion of a comment for the record.)

      I also very much appreciate the authors' excellent commentary regarding the physiological effects that pharmacological challenges such as propofol and ketamine also have on non-neural (autonomic) functions such as ECG. Previously I also asked them to discuss the possibility that, while their manuscript focuses on aperiodic activity, it is possible that the wealth of literature regarding age-related changes in "oscillatory" activity might be driven partly by age-related changes in neural (or non-neural, ECG-related) changes in aperiodic activity. They have included a nice discussion on this, and I'm excited about the possibilities for cognitive neuroscience as we move more in this direction.

      Finally, I previously asked for recommendations on how to proceed. The authors convinced me that we should care about how the ECG might impact our field potential measures, but how do I, as a relative novice, proceed. They now include three strong recommendations at the end of their manuscript that I find to be very helpful.

      As was obvious from previous review, I consider this to be an important and impactful cautionary report, that is incredibly well supported by multiple thorough analyses. The authors have done an excellent job responding to all my previous comments and concerns and, in my estimation, those of the previous reviewers as well.

      We want to thank the reviewer for agreeing to review our manuscript again and for recapitulating on their previous comments and the progress the manuscript has made over the course of the last ~2 years. The reviewer's comments have been essential in shaping the manuscript into its current form. Their feedback has made the review process truly feel like a collaborative effort, focused on strengthening the manuscript and refining its conclusions and resulting recommendations.

      Reviewer #3 (Public review):

      Summary:

      Schmidt et al., aimed to provide an extremely comprehensive demonstration of the influence cardiac electromagnetic fields have on the relationship between age and the aperiodic slope measured from electroencephalographic (EEG) and magnetoencephalographic (MEG) data.

      Strengths:

      Schmidt et al., used a multiverse approach to show that the cardiac influence on this relationship is considerable, by testing a wide range of different analysis parameters (including extensive testing of different frequency ranges assessed to determine the aperiodic fit), algorithms (including different artifact reduction approaches and different aperiodic fitting algorithms), and multiple large datasets to provide conclusions that are robust to the vast majority of potential experimental variations.

      The study showed that across these different analytical variations, the cardiac contribution to aperiodic activity measured using EEG and MEG is considerable, and likely influences the relationship between aperiodic activity and age to a greater extent than the influence of neural activity.

      Their findings have significant implications for all future research that aims to assess aperiodic neural activity, suggesting control for the influence of cardiac fields is essential.

      We want to thank the reviewer for their thorough engagement with our work and the resultant substantive amount of great ideas both mentioned in the section of Weaknesses and Authors Recommendations below. Their suggestions have sparked many ideas in us on how to move forward in better separating peripheral- from neuro-physiological signals that are likely to greatly influence our future attempts to better extract both cardiac and muscle activity from M/EEG recordings. So we want to thank them for their input, time and effort!

      Weaknesses:

      Figure 4I: The regressions explained here seem to contain a very large number of potential predictors. Based on the way it is currently written, I'm assuming it includes all sensors for both the ECG component and ECG rejected conditions?

      I'm not sure about the logic of taking a complete signal, decomposing it with ICA to separate out the ECG and non-ECG signals, then including these latent contributions to the full signal back into the same regression model. It seems that there could be some circularity or redundancy in doing so. Can the authors provide a justification for why this is a valid approach?

      After observing significant effects both in the MEG<sub>ECG component</sub> and MEG<sub>ECG rejected</sub> conditions in similar frequency bands we wanted to understand whether or not these age-related changes are statistically independent. To test this we added both variables as predictors in a regression model (thereby accounting for the influence of the other in relation to age). The regression models we performed were therefore actually not very complex. They were built using only two predictors, namely the data (in a specific frequency range) averaged over channels on which we noticed significant effects in the ECG rejected and ECG components data respectively (Wilkinson notation: age ~ 1 + ECG rejected + ECG components). This was also described in the results section stating that: “To see if MEG<sub>ECG rejected</sub> and MEG<sub>ECG component</sub> explain unique variance in aging at frequency ranges where we noticed shared effects, we averaged the spectral slope across significant channels and calculated a multiple regression model with MEG<sub>ECG component</sub> and MEG<sub>ECG rejected</sub> as predictors for age (to statistically control for the effect of MEG<sub>ECG component</sub>s and MEG<sub>ECG rejected</sub> on age). This analysis was performed to understand whether the observed shared age-related effects (MEG<sub>ECG rejected</sub> and MEG<sub>ECG component</sub>) are in(dependent).”  

      We hope this explanation solves the previous misunderstanding.

      I'm not sure whether there is good evidence or rationale to support the statement in the discussion that the presence of the ECG signal in reference electrodes makes it more difficult to isolate independent ECG components. The ICA algorithm will still function to detect common voltage shifts from the ECG as statistically independent from other voltage shifts, even if they're spread across all electrodes due to the referencing montage. I would suggest there are other reasons why the ICA might lead to imperfect separation of the ECG component (assumption of the same number of source components as sensors, non-Gaussian assumption, assumption of independence of source activities).

      The inclusion of only 32 channels in the EEG data might also have reduced the performance of ICA, increasing the chances of imperfect component separation and the mixing of cardiac artifacts into the neural components, whereas the higher number of sensors in the MEG data would enable better component separation. This could explain the difference between EEG and MEG in the ability to clean the ECG artifact (and perhaps higher-density EEG recordings would not show the same issue).

      The reviewer is making a good argument suggesting that our initial assumption that the presence of cardiac activity on the reference electrode influences the performance of the ICA may be wrong. After rereading and rethinking upon the matter we think that the reviewer is correct and that their assumptions for why the ECG signal was not so easily separable from our EEG recordings are more plausible and better grounded in the literature than our initial suggestion. We therefore now highlight their view as a main reason for why the ECG rejection was more challenging in EEG data. However, we also note that understanding the exact reason probably ends up being an empirical question that demands further research stating that:

      “Difficulties in removing ECG related components from EEG signals via ICA might be attributable to various reasons such as the number of available sensors or assumptions related to the non-gaussianity of the underlying sources. Further understanding of this matter is highly important given that ICA is the most widely used procedure to separate neural from peripheral physiological sources. ”

      In addition to the inability to effectively clean the ECG artifact from EEG data, ICA and other component subtraction methods have also all been shown to distort neural activity in periods that aren't affected by the artifact due to the ubiquitous issue of imperfect component separation (https://doi.org/10.1101/2024.06.06.597688). As such, component subtraction-based (as well as regression-based) removal of the cardiac artifact might also distort the neural contributions to the aperiodic signal, so even methods to adequately address the cardiac artifact might not solve the problem explained in the study. This poses an additional potential confound to the "M/EEG without ECG" conditions.

      The reviewer is correct in stating that, if an “artifactual” signal is not always present but appears and disappears (like e.g. eye-blinks) neural activity may be distorted in periods where the “artifactual” signal is absent. However, while this plausibly presents a problem for ocular activity, there is no obvious reason to believe that this applies to cardiac activity. While the ECG signal is non-stationary in nature, it is remarkably more stable than eye-movements in the healthy populations we analyzed (especially at rest). Therefore, the presence of the cardiac “artifact” was consistently present across the entirety of the MEG recordings we visually inspected.

      Literature Analysis, Page 23: was there a method applied to address studies that report reducing artifacts in general, but are not specific to a single type of artifact? For example, there are automated methods for cleaning EEG data that use ICLabel (a machine learning algorithm) to delete "artifact" components. Within these studies, the cardiac artifact will not be mentioned specifically, but is included under "artifacts".

      The literature analysis was largely performed automatically and solely focussed on ECG related activity as described in the methods section under Literature Analysis, if no ECG related terms were used in the context of artifact rejection a study was flagged as not having removed cardiac activity. This could have been indeed better highlighted by us and we apologize for the oversight on our behalf. We now additionally link to these details stating that:

      “However, an analysis of openly accessible M/EEG articles (N<sub>Articles</sub>=279; see Methods - Literature Analysis for further details) that investigate aperiodic activity revealed that only 17.1% of EEG studies explicitly mention that cardiac activity was removed and only 16.5% measure ECG (45.9% of MEG studies removed cardiac activity and 31.1% of MEG studies mention that ECG was measured; see Figure 1EF).”

      The reviewer makes a fair point that there is some uncertainty here and our results probably present a lower bound of ECG handling in M/EEG research as, when I manually rechecked the studies that were not initially flagged in studies it was often solely mentioned that “artifacts” were rejected. However, this information seemed too ambiguous to assume that cardiac activity was in fact accounted for. However, again this could have been mentioned more clearly in writing and we apologize for this oversight. Now this is included as part of the methods section Literature Analysis stating that:

      “All valid word contexts were then manually inspected by scanning the respective word context to ensure that the removal of “artifacts” was related specifically to cardiac and not e.g. ocular activity or the rejection of artifacts in general (without specifying which “artifactual” source was rejected in which case the manuscript was marked as invalid). This means that the results of our literature analysis likely present a lower bound for the rejection of cardiac activity in the M/EEG literature investigating aperiodic activity.”

      Statistical inferences, page 23: as far as I can tell, no methods to control for multiple comparisons were implemented. Many of the statistical comparisons were not independent (or even overlapped with similar analyses in the full analysis space to a large extent), so I wouldn't expect strong multiple comparison controls. But addressing this point to some extent would be useful (or clarifying how it has already been addressed if I've missed something).

      In the present study we tried to minimize the risk of type 1 errors by several means, such as A) weakly informative priors, B) robust regression models and C) by specifying a region of practical equivalence (ROPE, see Methods Statistical Inference for further Information) to define meaningful effects.

      Weakly informative priors can lower the risk of type 1 errors arising from multiple testing by shrinking parameter estimates towards zero (see e.g. Lemoine, 2019). Robust regression models use a Student T distribution to describe the distribution of the data. This distribution features heavier tails, meaning it allocates more probability to extreme values, which in turn minimizes the influence of outliers. The ROPE criterion ensures that only effects exceeding a negligible size are considered meaningful, representing a strict and conservative approach to interpreting our findings (see Kruschke 2018, Cohen, 1988).

      Furthermore, and more generally we do not selectively report “significant” effects in the situations in which multiple analyses were conducted on the same family of data (e.g. Figure 2 & 4). Instead we provide joint inference across several plausible analysis options (akin to a specification curve analysis, Simonsohn, Simmons & Nelson 2020) to provide other researchers with an overview of how different analysis choices impact the association between cardiac and neural aperiodic activity.

      Lemoine, N. P. (2019). Moving beyond noninformative priors: why and how to choose weakly informative priors in Bayesian analyses. Oikos, 128(7), 912-928.

      Simonsohn, U., Simmons, J. P., & Nelson, L. D. (2020). Specification curve analysis. Nature Human Behaviour, 4(11), 1208-1214.

      Methods:

      Applying ICA components from 1Hz high pass filtered data back to the 0.1Hz filtered data leads to worse artifact cleaning performance, as the contribution of the artifact in the 0.1Hz to 1Hz frequency band is not addressed (see Bailey, N. W., Hill, A. T., Biabani, M., Murphy, O. W., Rogasch, N. C., McQueen, B., ... & Fitzgerald, P. B. (2023). RELAX part 2: A fully automated EEG data cleaning algorithm that is applicable to Event-Related-Potentials. Clinical Neurophysiology, result reported in the supplementary materials). This might explain some of the lower frequency slope results (which include a lower frequency limit <1Hz) in the EEG data - the EEG cleaning method is just not addressing the cardiac artifact in that frequency range (although it certainly wouldn't explain all of the results).

      We want to thank the reviewer for suggesting this interesting paper, showing that lower high-pass filters may be preferable to the more commonly used >1Hz high-pass filters for detection of ICA components that largely contain peripheral physiological activity. However, the results presented by Bailey et al. contradict the more commonly reported findings by other researchers that >1Hz high-pass filter is actually preferable (e.g. Winkler et al. 2015; Dimingen, 2020 or Klug & Gramann, 2021) and recommendations in widely used packages for M/EEG analysis (e.g. https://mne.tools/1.8/generated/mne.preprocessing.ICA.html). Yet, the fact that there seems to be a discrepancy suggests that further research is needed to better understand which type of high-pass filtering is preferable in which situation. Furthermore, it is notable that all the findings for high-pass filtering in ICA component detection and removal that we are aware of relate to ocular activity. Given that ocular and cardiac activity have very different temporal and spectral patterns it is probably worth further investigating whether the classic 1Hz high-pass filter is really also the best option for the detection and removal of cardiac activity. However, in our opinion this requires a dedicated investigation on its own..

      We therefore highlight this now in our manuscript stating that:

      “Additionally, it is worth noting that the effectiveness of an ICA crucially depends on the quality of the extracted components(63,64) and even widely suggested settings e.g. high-pass filtering at 1Hz before fitting an ICA may not be universally applicable (see supplementary material of (64)).

      Winkler, S. Debener, K. -R. Müller and M. Tangermann, "On the influence of high-pass filtering on ICA-based artifact reduction in EEG-ERP," 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Milan, Italy, 2015, pp. 4101-4105, doi: 10.1109/EMBC.2015.7319296.

      Dimigen, O. (2020). Optimizing the ICA-based removal of ocular EEG artifacts from free viewing experiments. NeuroImage, 207, 116117.

      Klug, M., & Gramann, K. (2021). Identifying key factors for improving ICA‐based decomposition of EEG data in mobile and stationary experiments. European Journal of Neuroscience, 54(12), 8406-8420.

      It looks like no methods were implemented to address muscle artifacts. These can affect the slope of EEG activity at higher frequencies. Perhaps the Riemannian Potato addressed these artifacts, but I suspect it wouldn't eliminate all muscle activity. As such, I would be concerned that remaining muscle artifacts affected some of the results, particularly those that included high frequency ranges in the aperiodic estimate. Perhaps if muscle activity were left in the EEG data, it could have disrupted the ability to detect a relationship between age and 1/f slope in a way that didn't disrupt the same relationship in the cardiac data (although I suspect it wouldn't reverse the overall conclusions given the number of converging results including in lower frequency bands). Is there a quick validity analysis the authors can implement to confirm muscle artifacts haven't negatively affected their results?

      I note that an analysis of head movement in the MEG is provided on page 32, but it would be more robust to show that removing ICA components reflecting muscle doesn't change the results. The results/conclusions of the following study might be useful for objectively detecting probable muscle artifact components: Fitzgibbon, S. P., DeLosAngeles, D., Lewis, T. W., Powers, D. M. W., Grummett, T. S., Whitham, E. M., ... & Pope, K. J. (2016). Automatic determination of EMG-contaminated components and validation of independent component analysis using EEG during pharmacologic paralysis. Clinical neurophysiology, 127(3), 1781-1793.

      We thank the reviewer for their suggestion. Muscle activity can indeed be a potential concern, for the estimation of the spectral slope. This is precisely why we used head movements (as also noted by the reviewer) as a proxy for muscle activity. We also agree with the reviewer that this is not a perfect estimate. Additionally, also the riemannian potato would probably only capture epochs that contain transient, but not persistent patterns of muscle activity.

      The paper recommended by the reviewer contains a clever approach of using the steepness of the spectral slope (or lack thereof) as an indicator whether or not an independent component (IC) is driven by muscle activity. In order to determine an optimal threshold Fitzgibbon et al. compared paralyzed to temporarily non paralyzed subjects. They determined an expected “EMG-free” threshold for their spectral slope on paralyzed subjects and used this as a benchmark to detect IC’s that were contaminated by muscle activity in non paralyzed subjects.

      This is a great idea, but unfortunately would go way beyond what we are able to sensibly estimate with our data for the following reasons. The authors estimated their optimal threshold on paralyzed subjects for EEG data and show that this is a feasible threshold to be applied across different recordings. So for EEG data it might be feasible, at least as a first shot, to use their threshold on our data. However, we are measuring MEG and as alluded to in our discussion section under “Differences in aperiodic activity between magnetic and electric field recordings” the spectral slope differs greatly between MEG and EEG recordings for non-trivial reasons. Furthermore, the spectral slope even seems to also differ across different MEG devices. We noticed this when we initially tried to pool the data recorded in Salzburg with the Cambridge dataset. This means we would need to do a complete validation of this procedure for the MEG data recorded in Cambridge and in Salzburg, which is not feasible considering that we A) don’t have direct access to one of the recording sites and B) would even if we had access face substantial hurdles to get ethical approval for the experiment performed by Fitzgibbon et al..

      However, we think the approach brought forward by Fitzgibbon and colleagues is a clever way to remove muscle activity from EEG recordings, whenever EMG was not directly recorded. We therefore suggested in the Discussion section that ideally also EMG should be recorded stating that:

      “It is worth noting that, apart from cardiac activity, muscle activity can also be captured in (non-)invasive recordings and may drastically influence measures of the spectral slope(72). To ensure that persistent muscle activity does not bias our results we used changes in head movement velocity as a control analysis (see Supplementary Figure S9). However, it should be noted that this is only a proxy for the presence of persistent muscle activity. Ideally, studies investigating aperiodic activity should also be complemented by measurements of EMG. Whenever such measurements are not available creative approaches that use the steepness of the spectral slope (or the lack thereof) as an indicator to detect whether or not e.g. an independent component is driven by muscle activity are promising(72,73). However, these approaches may require further validation to determine how well myographic aperiodic thresholds are transferable across the wide variety of different M/EEG devices.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) As outlined above, I recommend rephrasing the last section of the introduction to briefly summarize/introduce all main analysis steps undertaken in the study and why these were done (for example, it is only mentioned that the Cam-CAN dataset was used to study the impact of cardiac on MEG activity although the author used a variety of different datasets). Similarly, I am missing an overview of all main findings in the context of the study goals in the discussion. I believe clarifying the structure of the paper would not only provide a red thread to the reader but also highlight the efforts/strength of the study as described above.

      This is a good call! As suggested by the reviewer we now try to give a clearer overview of what was investigated why. We do that both at the end of the introduction stating that: “Using the publicly available Cam-CAN dataset(28,29), we find that the aperiodic signal measured using M/EEG originates from multiple physiological sources. In particular, significant portions of age-related changes in aperiodic activity –normally attributed to neural processes– can be better explained by cardiac activity. This observation holds across a wide range of processing options and control analyses (see Supplementary S1), and was replicable on a separate MEG dataset. However, the extent to which cardiac activity accounts for age-related changes in aperiodic activity varies with the investigated frequency range and recording site. Importantly, in some frequency ranges and sensor locations, age-related changes in neural aperiodic activity still prevail. But does the influence of cardiac activity on the aperiodic spectrum extend beyond age? In a preliminary analysis, we demonstrate that working memory load modulates the aperiodic spectrum of “pure” ECG recordings. The direction of this working memory effect mirrors previous findings on EEG data(5) suggesting that the impact of cardiac activity goes well beyond aging. In sum, our results highlight the complexity of aperiodic activity while cautioning against interpreting it as solely “neural“ without considering physiological influences.”

      and at the beginning of the discussion section:

      “Difficulties in removing ECG related components from EEG signals via ICA might be attributable to various reasons such as the number of available sensors or assumptions related to the non-gaussianity of the underlying sources. Further understanding of this matter is highly important given that ICA is the most widely used procedure to separate neural from peripheral physiological sources (see Figure 1EF). Additionally, it is worth noting that the effectiveness of an ICA crucially depends on the quality of the extracted components(63,64) and even widely suggested settings e.g. high-pass filtering at 1Hz before fitting an ICA may not be universally applicable (see supplementary material of (64)). “

      (2) I found it interesting that the spectral slopes of ECG activity at higher frequency ranges (> 10 Hz) seem mostly related to HRV measures such as fractal and time domain indices and less so with frequency-domain indices. Do the authors have an explanation for why this is the case? Also, the analysis of the HRV measures and their association with aperiodic ECG activity is not explained in any of the method sections.

      We apologize for the oversight in not mentioning the HRV analysis in more detail in our methods section. We added a subsection to the Methods section entitled ECG Processing - Heart rate variability analysis to further describe the HRV analyses.

      “ECG Processing - Heart rate variability analysis

      Heart rate variability (HRV) was computed using the NeuroKit2 toolbox, a high level tool for the analysis of physiological signals. First, the raw electrocardiogram (ECG) data were preprocessed, by highpass filtering the signal at 0.5Hz using an infinite impulse response (IIR) butterworth filter(order=5) and by smoothing the signal with a moving average kernel with the width of one period of 50Hz to remove the powerline noise (default settings of neurokit.ecg.ecg_clean). Afterwards, QRS complexes were detected based on the steepness of the absolute gradient of the ECG signal. Subsequently, R-Peaks were detected as local maxima in the QRS complexes (default settings of neurokit.ecg.ecg_peaks; see (98) for a validation of the algorithm). From the cleaned R-R intervals, 90 HRV indices were derived, encompassing time-domain, frequency-domain, and non-linear measures. Time-domain indices included standard metrics such as the mean and standard deviation of the normalized R-R intervals , the root mean square of successive differences, and other statistical descriptors of interbeat interval variability. Frequency-domain analyses were performed using power spectral density estimation, yielding for instance low frequency (0.04-0.15Hz) and high frequency (0.15-0.4Hz) power components. Additionally, non-linear dynamics were characterized through measures such as sample entropy, detrended fluctuation analysis and various Poincaré plot descriptors. All these measures were then related to the slopes of the low frequency (0.25 – 20 Hz) and high frequency (10 – 145 Hz) aperiodic spectrum of the raw ECG.”

      With regards to association of the ECG’s spectral slopes at high frequencies and frequency domain indices of heart rate variability. Common frequency domain indices of heart rate variability fall in the range of 0.01-.4Hz. Which probably explains why we didn’t notice any association at higher frequency ranges (>10Hz).

      This is also stated in the related part of the results section:

      “In the higher frequency ranges (10 - 145 Hz) spectral slopes were most consistently related to fractal and time domain indices of heart rate variability, but not so much to frequency-domain indices assessing spectral power in frequency ranges < 0.4 Hz.”

      (3) Related to the previous point - what is being reflected in the ECG at higher frequency ranges, with regard to biological mechanisms? Results are being mentioned, but not further discussed. However, this point seems crucial because the age effects across the four datasets differ between low and high-frequency slope limits (Figure 2C).

      This is a great question that definitely also requires further attention and investigation in general (see also Tereshchenko & Josephson, 2015). We investigated the change of the slope across frequency ranges that are typically captured in common ECG setups for adults (0.05 - 150Hz, Tereshchenko & Josephson, 2015; Kusayama, Wong, Liu et al. 2020). While most of the physiological significant spectral information of an ECG recording rests between 1-50Hz (Clifford & Azuaje, 2006), meaningful information can be extracted at much higher frequencies. For instance, ventricular late potentials have a broader frequency band (~40-250Hz) that falls straight in our spectral analysis window. However, that’s not all, as further meaningful information can be extracted at even higher frequencies (>100Hz). Yet, the exact physiological mechanisms underlying so-called high-frequency QRS remain unclear (HF-QRS; see Tereshchenko & Josephson, 2015; Qiu et al. 2024 for a review discussing possible mechanisms). Yet, at the same time the HF-QRS seems to be highly informative for the early detection of myocardial ischemia and other cardiac abnormalities that may not yet be evident in the standard frequency range (Schlegel et al. 2004; Qiu et al. 2024). All optimism aside, it is also worth noting that ECG recordings at higher frequencies can capture skeletal muscle activity with an overlapping frequency range up to 400Hz (Kusayama, Wong, Liu et al. 2020). We highlight all of this now when introducing this analysis in the results sections as outstanding research question stating that:

      “However, substantially less is known about aperiodic activity above 0.4Hz in the ECG. Yet, common ECG setups for adults capture activity at a broad bandwidth of 0.05 - 150Hz(33,34).

      Importantly, a lot of the physiological meaningful spectral information rests between 1-50Hz(35), similarly to M/EEG recordings. Furthermore, meaningful information can be extracted at much higher frequencies. For instance, ventricular late potentials have a broader frequency band (~40-250Hz(35)). However, that’s not all, as further meaningful information can be extracted at even higher frequencies (>100Hz). For instance, the so-called high-frequency QRS seems to be highly informative for the early detection of myocardial ischemia and other cardiac abnormalities that may not yet be evident in the standard frequency range(36,37). Yet, the exact physiological mechanisms underlying the high-frequency QRS remain unclear (see (37) for a review discussing possible mechanisms). ”

      Tereshchenko, L. G., & Josephson, M. E. (2015). Frequency content and characteristics of ventricular conduction. Journal of electrocardiology, 48(6), 933-937.

      Kusayama, T., Wong, J., Liu, X. et al. Simultaneous noninvasive recording of electrocardiogram and skin sympathetic nerve activity (neuECG). Nat Protoc 15, 1853–1877 (2020). https://doi.org/10.1038/s41596-020-0316-6

      Clifford, G. D., & Azuaje, F. (2006). Advanced methods and tools for ECG data analysis (Vol. 10). P. McSharry (Ed.). Boston: Artech house.

      Qiu, S., Liu, T., Zhan, Z., Li, X., Liu, X., Xin, X., ... & Xiu, J. (2024). Revisiting the diagnostic and prognostic significance of high-frequency QRS analysis in cardiovascular diseases: a comprehensive review. Postgraduate Medical Journal, qgae064.

      Schlegel, T. T., Kulecz, W. B., DePalma, J. L., Feiveson, A. H., Wilson, J. S., Rahman, M. A., & Bungo, M. W. (2004, March). Real-time 12-lead high-frequency QRS electrocardiography for enhanced detection of myocardial ischemia and coronary artery disease. In Mayo Clinic Proceedings (Vol. 79, No. 3, pp. 339-350). Elsevier.

      (4) Page 10: At first glance, it is not quite clear what is meant by "processing option" in the text. Please clarify.

      Thank you for catching this! Upon re-reading this is indeed a bit oblivious. We now swapped “processing options” with “slope fits” to make it clearer that we are talking about the percentage of effects based on the different slope fits.

      (5) The authors mention previous findings on age effects on neural 1/f activity (References Nr 5,8,27,39) that seem contrary to their own findings such as e.g., the mostly steepening of the slopes with age. Also, the authors discuss thoroughly why spectral slopes derived from MEG signals may differ from EEG signals. I encourage the authors to have a closer look at these studies and elaborate a bit more on why these studies differ in their conclusions on the age effects. For example, Tröndle et al. (2022, Ref. 39) investigated neural activity in children and young adults, hence, focused on brain maturation, whereas the CamCAN set only considers the adult lifespan. In a similar vein, others report age effects on 1/f activity in much smaller samples as reported here (e.g., Voytek et al., 2015).

      I believe taking these points into account by briefly discussing them, would strengthen the authors' claims and provide a more fine-grained perspective on aging effects on 1/f.

      The reviewer is making a very important point. As age-related differences in (neuro-)physiological activity are not necessarily strictly comparable and entirely linear across different age-cohorts (e.g. age-related changes in alpha center frequency). We therefore, added the suggested discussion points to the discussion section.

      “Differences in electric and magnetic field recordings aside, aperiodic activity may not change strictly linearly as we are ageing and studies looking at younger age groups (e.g. <22; (44) may capture different aspects of aging (e.g. brain maturation), than those looking at older subjects (>18 years; our sample). A recent report even shows some first evidence of an interesting putatively non-linear relationship with age in the sensorimotor cortex for resting recordings(59)”

      (6) The analysis of the working memory paradigm as described in the outlook-section of the discussion comes as a bit of a surprise as it has not been introduced before. If the authors want to convey with this study that, in general, aperiodic neural activity could be influenced by aperiodic cardiac activity, I recommend introducing this analysis and the results earlier in the manuscript than only in the discussion to strengthen their message.

      The reviewer is correct. This analysis really comes a bit out of the blue. However, this was also exactly the intention for placing this analysis in the discussion. As the reviewer correctly noted, the aim was to suggest “that, in general, aperiodic neural activity could be influenced by aperiodic cardiac activity”. We placed this outlook directly after the discussion of “(neuro-)physiological origins of aperiodic activity”, where we highlight the potential challenges of interpreting drug induced changes to M/EEG recordings. So the aim was to get the reader to think about whether age is the only feature affected by cardiac activity and then directly present some evidence that this might go beyond age.

      However, we have been rethinking this approach based on the reviewers comments and moved that paragraph to the end of the results section accordingly and introduce it already at the end of the introduction stating that:

      “But does the influence of cardiac activity on the aperiodic spectrum extend beyond age? In a preliminary analysis, we demonstrate that working memory load modulates the aperiodic spectrum of “pure” ECG recordings. The direction of this working memory effect mirrors previous findings on EEG data(5) suggesting that the impact of cardiac activity goes well beyond aging.”

      (7) The font in Figure 2 is a bit hard to read (especially in D). I recommend increasing the font sizes where necessary for better readability.

      We agree with the Reviewer and increased the font sizes accordingly.

      (8) Text in the discussion: Figure 3B on page 10 => shouldn't it be Figure 4?

      Thank you for catching this oversight. We have now corrected this mistake.

      (9) In the third section on page 10, the Figure labels seem to be confused. For example, Figure 4 E is supposed to show "steepening effects", which should be Figure 4B I believe.

      Please check the figure labels in this section to avoid confusion.

      Thank you for catching this oversight. We have now corrected this mistake.

      (10) Figure Legend 4 I), please check the figure labels in the text

      Thank you for catching this oversight. We have now corrected this mistake.

      Reviewer #3 (Recommendations for the authors):

      I have a number of suggestions for improving the manuscript, which I have divided by section in the following:

      ABSTRACT:

      I would suggest re-writing the first sentences to make it easier to read for non-expert readers: "The power of electrophysiologically measured cortical activity decays with an approximately 1/fX function. The slope of this decay (i.e. the spectral exponent, X) is modulated..."

      Thank you for the suggestion. We adjusted the sentence as suggested to make it easier for less technical readers to understand that “X” refers to the exponent.

      Including the age range that was studied in the abstract could be informative.

      Done as suggested.

      As an optional recommendation, I think it would increase the impact of the article if the authors note in the abstract that the current most commonly applied cardiac artifact reduction approaches don't resolve the issue for EEG data, likely due to an imperfect ability to separate the cardiac artifact from the neural activity with independent component analysis. This would highlight to the reader that they can't just expect to address these concerns by cleaning their data with typical cleaning methods.

      I think it would also be useful to convey in the abstract just how comprehensive the included analyses were (in terms of artifact reduction methods tested, different aperiodic algorithms and frequency ranges, and both MEG and EEG). Doing so would let the reader know just how robust the conclusions are likely to be.

      This is a brilliant idea! As suggested we added a sentence highlighting that simply performing an ICA may not be sufficient to separate cardiac contributions to M/EEG recordings and refer to the comprehensiveness of the performed analyses.

      INTRODUCTION:

      I would suggest re-writing the following sentence for readability: "In the past, aperiodic neural activity, other than periodic neural activity (local peaks that rise above the "power-law" distribution), was often treated as noise and simply removed from the signal"

      To something like: "In the past, aperiodic neural activity was often treated as noise and simply removed from the signal e.g. via pre-whitening, so that analyses could focus on periodic neural activity (local peaks that rise above the "power-law" distribution, which are typically thought to reflect neural oscillations).

      We are happy to follow that suggestion.

      Page 3: please provide the number of articles that were included in the examination of the percentage that remove cardiac activity, and note whether the included articles could be considered a comprehensive or nearly comprehensive list, or just a representative sample.

      We stated the exact number of articles in the methods section under Literature Analysis. However, we added it to the Introduction on page 3 as suggested by the reviewer. The selection of articles was done automatically, dependent on a list of pre-specified terms and exclusively focussed on articles that had terms related to aperiodic activity in their title (see Literature Analysis). Therefore, I would personally be hesitant in calling it a comprehensive or nearly comprehensive list of the general M/EEG literature as the analysis of aperiodic activity is still relatively niche compared to the more commonly investigated evoked potentials or oscillations. I think whether or not a reader perceives our analysis as comprehensive should be up to them to decide and does not reflect something I want to impose on them. This is exacerbated by the fact that the analysis of neural aperiodic activity has rapidly gained traction over the last years (see Figure 1D orange) and the literature analysis was performed almost 2 years ago and therefore, in my eyes, only represents a glimpse in the rapidly evolving field related to the analysis of aperiodic activity.

      Figure 1E-F: It's not completely clear that the "Cleaning Methods" part of the figure indicates just methods to clean the cardiac artifact (rather than any artifact). It also seems that ~40% of EEG studies do not apply any cleaning methods even from within the studies that do clean the cardiac artifact (if I've read the details correctly). This seems unlikely. Perhaps there should be a bar for "other methods", or "unspecified"? Having said that, I'm quite familiar with the EEG artifact reduction literature, and I would be very surprised if ~40% of studies cleaned the cardiac artifact using a different method to the methods listed in the bar graph, so I'm wondering if I've misunderstood the figure, or whether the data capture is incomplete / inaccurate (even though the conclusion that ICA is the most common method is almost certainly accurate).

      The cleaning is indeed only focussed on cardiac activity specifically. This was however also mentioned in the caption of Figure 1: “We were further interested in determining which artifact rejection approaches were most commonly used to remove cardiac activity, such as independent component analysis (ICA(22)), singular value decomposition (SVD(23)), signal space separation (SSS(24)), signal space projections (SSP(25)) and denoising source separation (DSS(26)).” and in the methods section under Literature Analysis. However, we adjusted figure 1EF to make it more obvious that the described cleaning methods were only related to the ECG. Aside from using blind source separation techniques such as ICA a good amount of studies mentioned that they cleaned their data based on visual inspection (which was not further considered). Furthermore, it has to be noted that only studies were marked as having separated cardiac from neural activity, when this was mentioned explicitly.

      RESULTS:

      Page 6: I would delete the "from a neurophysiological perspective" clause, which makes the sentence more difficult to read and isn't so accurate (frequencies 13-25Hz would probably more commonly be considered mid-range rather than low or high). Additionally, both frequency ranges include 15Hz, but the next sentence states that the ranges were selected to avoid the knee at 15Hz, which seems to be a contradiction. Could the authors explain in more detail how the split addresses the 15Hz knee?

      We removed the “from a neurophysiological perspective” clause as suggested. With regards to the “knee” at ~15Hz I would like to defer the reviewer to Supplementary Figure S1. The Knee Frequency varies substantially across subjects so splitting the data at only 1 exact Frequency did not seem appropriate. Additionally, we found only spurious significant age-related variations in Knee Frequency (i.e. only one out of the 4 datasets; not shown).

      Furthermore, we wanted to better connect our findings to our MEG results in Figure 4 and also give the readers a holistic overview of how different frequency ranges in the aperiodic ECG would be affected by age. So to fulfill all of these objectives we decided to fit slopes with respective upper/lower bounds around a range of 5Hz above and below the average 15Hz Knee Frequency across datasets.

      The later parts of this same paragraph refer to a vast amount of different frequency ranges, but only the "low" and "high" frequency ranges were previously mentioned. Perhaps the explanation could be expanded to note that multiple lower and upper bounds were tested within each of these low and high frequency windows?

      This is a good catch we adjusted the sentence as suggested. We now write: “.. slopes were fitted individually to each subject's power spectrum in several lower (0.25 – 20 Hz) and higher (10-145 Hz) frequency ranges.”

      The following two sentences seem to contradict each other: "Overall, spectral slopes in lower frequency ranges were more consistently related to heart rate variability indices(> 39.4% percent of all investigated indices)" and: "In the lower frequency range (0.25 - 20Hz), spectral slopes were consistently related to most measures of heart rate variability; i.e. significant effects were detected in all 4 datasets (see Figure 2D)." (39.4% is not "most").

      The reviewer is correct in stating that 39.4% is not most. However, the 39.4% is the lowest bound and only refers to 1 dataset. In the other 3 datasets the percentage of effects was above 64% which can be categorized as “most” i.e. above 50%. We agree that this was a bit ambiguous in the sentence so we added the other percentages as well as a reference to Figure 2D to make this point clearer.

      Figure 2D: it isn't clear what the percentages in the semi-circles reflect, nor why some semi-circles are more full circles while others are only quarter circles.

      The percentages in the semi-circles reflect the amount of effects (marked in red) and null effects (marked in green) per dataset, when viewed as average across the different measures of HRV. Sometimes less effects were found for some frequency ranges resulting in quarters instead of semi circles.

      Page 8: I think the authors could make it more clear that one of the conditions they were testing was the ECG component of the EEG data (extracted by ICA then projected back into the scalp space for the temporal response function analysis).

      As suggested by the reviewer we adjusted our wording and replaced the arguably a bit ambiguous “... projected back separately” with “... projected back into the sensor space”. We thank the reviewer for this recommendation, as it does indeed make it easier to understand the procedure.

      “After pre-processing (see Methods) the data was split in three conditions using an ICA(22). Independent components that were correlated (at r > 0.4; see Methods: MEG/EEG Processing - pre-processing) with the ECG electrode were either not removed from the data (Figure 3ABCD - blue), removed from the data (Figure 2ABCD - orange) or projected back into the sensor space (Figure 3ABCD - green).”

      Figure 4A: standardized beta coefficients for the relationship between age and spectral slope could be noted to provide improved clarity (if I'm correct in assuming that is what they reflect).

      This was indeed shown in Figure 4A and noted in the color bar as “average beta (standardized)”. We do not specifically highlight this in the text, because the exact coefficients would depend on both on the analyzed frequency range and the selected electrodes.

      Figure 4I: The regressions explained at this point seems to contain a very large number of potential predictors, as I'm assuming it includes all sensors for both the ECG component and ECG rejected conditions? (if that is not the case, it could be explained in greater detail). I'm also not sure about the logic of taking a complete signal, decomposing it with ICA to separate out the ECG and non-ECG signals, then including them back into the same regression model. It seems that there could be some circularity or redundancy in doing so. However, I'm not confident that this is an issue, so would appreciate the authors explaining why it this is a valid approach (if that is the case).

      After observing significant effects both in the MEG<sub>ECG component</sub> and MEG<sub>ECG rejected</sub> conditions in similar frequency bands we wanted to understand whether or not these age-related changes are statistically independent. To test this we added both variables as predictors in a regression model (thereby accounting for the influence of the other in relation to age). The regression models we performed were therefore actually not very complex. They were built using only two predictors, namely the data (in a specific frequency range) averaged over channels on which we noticed significant effects in the ECG rejected and ECG components data respectively (Wilkinson notation: age ~ 1 + ECG rejected + ECG components). This was also described in the results section stating that: “To see if MEG<sub>ECG rejected</sub> and MEG<sub>ECG component</sub> explain unique variance in aging at frequency ranges where we noticed shared effects, we averaged the spectral slope across significant channels and calculated a multiple regression model with MEG<sub>ECG component</sub> and MEG<sub>ECG rejected</sub> as predictors for age (to statistically control for the effect of MEG<sub>ECG component</sub>s and MEG<sub>ECG rejected</sub> on age). This analysis was performed to understand whether the observed shared age-related effects (MEG<sub>ECG rejected</sub> and MEG<sub>ECG component</sub>) are in(dependent).”  

      We hope this explanation solves the previous misunderstanding.

      The explanation of results for relationships between spectral slopes and aging reported in Figure 4 refers to clusters of effects, but the statistical inference methods section doesn't explain how these clusters were determined.

      The wording of “cluster” was used to describe a “category” of effects e.g. null effects. We changed the wording from “cluster” to “category” to make this clearer stating now that: “This analysis, which is depicted in Figure 4, shows that over a broad amount of individual fitting ranges and sensors, aging resulted in a steepening of spectral slopes across conditions (see Figure 4E) with “steepening effects” observed in 25% of the processing options in MEG<sub>ECG not rejected</sub> , 0.5% in MEG<sub>ECG rejected</sub>, and 60% for MEG<sub>ECG components</sub>. The second largest category of effects were “null effects” in 13% of the options for MEG<sub>ECG not rejected</sub> , 30% in MEG<sub>ECG rejected</sub>, and 7% for MEG<sub>ECG components</sub>. ”

      Page 12: can the authors clarify whether these age related steepenings of the spectral slope in the MEG are when the data include the ECG contribution, or when the data exclude the ECG? (clarifying this seems critical to the message the authors are presenting).

      We apologize for not making this clearer. We now write: “This analysis also indicates that a vast majority of observed effects irrespective of condition (ECG components, ECG not rejected, ECG rejected) show a steepening of the spectral slope with age across sensors and frequency ranges.”

      Page 13: I think it would be useful to describe how much variance was explained by the MEG-ECG rejected vs MEG-ECG component conditions for a range of these analyses, so the reader also has an understanding of how much aperiodic neural activity might be influenced by age (vs if the effects are really driven mostly by changes in the ECG).

      With regards to the explained variance I think that the very important question of how strong age influences changes in aperiodic activity is a topic better suited for a meta analysis. As the effect sizes seems to vary largely depending on the sample e.g. for EEG in the literature results were reported at r=-0.08 (Cesnaite et al. 2023), r=-0.26 (Cellier et al. 2021), r=-0.24/r=-0.28/r=-0.35 (Hill et al. 2022) and r=0.5/r=0.7 (Voytek et al. 2015). I would defer the reader/reviewer to the standardized beta coefficients as a measure of effect size in the current study that is depicted in Figure 4A.

      Cellier, D., Riddle, J., Petersen, I., & Hwang, K. (2021). The development of theta and alpha neural oscillations from ages 3 to 24 years. Developmental cognitive neuroscience, 50, 100969.

      Cesnaite, E., Steinfath, P., Idaji, M. J., Stephani, T., Kumral, D., Haufe, S., ... & Nikulin, V. V. (2023). Alterations in rhythmic and non‐rhythmic resting‐state EEG activity and their link to cognition in older age. NeuroImage, 268, 119810.

      Hill, A. T., Clark, G. M., Bigelow, F. J., Lum, J. A., & Enticott, P. G. (2022). Periodic and aperiodic neural activity displays age-dependent changes across early-to-middle childhood. Developmental Cognitive Neuroscience, 54, 101076.

      Voytek, B., Kramer, M. A., Case, J., Lepage, K. Q., Tempesta, Z. R., Knight, R. T., & Gazzaley, A. (2015). Age-related changes in 1/f neural electrophysiological noise. Journal of Neuroscience, 35(38), 13257-13265.

      Also, if there are specific M/EEG sensors where the 1/f activity does relate strongly to age, it would be worth noting these, so future research could explore those sensors in more detail.

      I think it is difficult to make a clear claim about this for MEG data, as the exact location or type of the sensor may differ across manufacturers. Such a statement could be easier made for source projected data or in case EEG electrodes were available, where the location would be normed eg. according to the 10-20 system.

      DISCUSSION:

      Page 15: Please change the wording of the following sentence, as the way it is currently worded seems to suggest that the authors of the current manuscript have demonstrated this point (which I think is not the case): "The authors demonstrate that EEG typically integrates activity over larger volumes than MEG, resulting in differently shaped spectra across both recording methods."

      Apologies for the oversight! The reviewer is correct we in fact did not show this, but the authors of the cited manuscript. We correct the sentence as suggested stating now that:

      “Bénar et al. demonstrate that EEG typically integrates activity over larger volumes than MEG, resulting in differently shaped spectra across both recording methods.”

      Page 16: The authors mention the results can be sensitive to the application of SSS to clean the MEG data, but not ICA. I think it would be sensitive to the application of either SSS or ICA?

      This is correct and actually also supported by Figure S7, as differences in ICA thresholds affect also the detection of age-related effects. We therefore adjusted the related sentences stating now that:

      “ In case of the MEG signal this may include the application of Signal-Space-Separation algorithms (SSS(24,55)), different thresholds for ICA component detection (see Figure S7), high and low pass filtering, choices during spectral density estimation (window length/type etc.), different parametrization algorithms (e.g. IRASA vs FOOOF) and selection of frequency ranges for the aperiodic slope estimation.”

      It would be worth clarifying that the linked mastoid re-reference alone has been proposed to cancel out the ECG signal, rather than that a linked-mastoid re-reference improves the performance of the ICA separation (which could be inferred by the explanation as it's currently written).

      This is correct and we adjusted the sentence accordingly! Stating now that:

      “ Previous work(12,56) has shown that a linked mastoid reference alone was particularly effective in reducing the impact of ECG related activity on aperiodic activity measured using EEG. “

      The issue of the number of EEG channels could probably just be noted as a potential limitation, as could the issue of neural activity being mixed into the ECG component (although this does pose a potential confound to the M/EEG without ECG condition, I suspect it wouldn't be critical).

      This is indeed a very fair point as a higher amount of electrodes would probably make it easier to better isolate ECG components in the EEG, which may be the reason why the separation did not work so well in our case. However, this is ultimately an empirical question so we highlighted it in the discussion section stating that: “Difficulties in removing ECG related components from EEG signals via ICA might be attributable to various reasons such as the number of available sensors or assumptions related to the non-gaussianity of the underlying sources. Further understanding of this matter is highly important given that ICA is the most widely used procedure to separate neural from peripheral physiological sources. ”

      OUTLOOK:

      Page 19: Although there has been a recent trend to control for 1/f activity when examining oscillatory power, recent research suggests that this should only be implemented in specific circumstances, otherwise the correction causes more of a confound than the issue does. It might be worth considering this point with regards to the final recommendation in the Outlook section: Brake, N., Duc, F., Rokos, A., Arseneau, F., Shahiri, S., Khadra, A., & Plourde, G. (2024). A neurophysiological basis for aperiodic EEG and the background spectral trend. Nature Communications, 15(1), 1514.

      We want to thank the reviewer for recommending this very interesting paper! The authors of said paper present compelling evidence showing that, while peak detection above an aperiodic trend using methods like FOOOF or IRASA is a prerequisite to determine the presence of oscillatory activity, it’s not necessarily straightforward to determine which detrending approach should be applied to determine the actual power of an oscillation. Furthermore, the authors suggest that wrongfully detrending may cause larger errors than not detrending at all. We therefore added a sentence stating that: “However, whether or not periodic activity (after detection) should be detrended using approaches like FOOOF or IRASA still remains disputed, as incorrectly detrending the data may cause larger errors than not detrending at all(75).”

      RECOMMENDATIONS:

      Page 20: "measure and account for" seems like it's missing a word, can this be re-written so the meaning is more clear?

      Done as suggested. The sentence now states: “To better disentangle physiological and neural sources of aperiodic activity, we propose the following steps to (1) measure and (2) account for physiological influences.”

      I would re-phrase "doing an ICA" to "reducing cardiac artifacts using ICA" (this wording could be changed in other places also).

      I do not like to describe cardiac or ocular activity as artifactual per se. This is also why I used hyphens whenever I mention the word “artifact” in association with the ECG or EOG. However, I do understand that the wording of “doing an ICA” is a bit sloppy. We therefore reworded it accordingly throughout the manuscript to e.g. “separating cardiac from neural sources using an ICA” and “separating physiological from neural sources using an ICA”.

      I would additionally note that even if components are identified as unambiguously cardiac, it is still likely that neural activity is mixed in, and so either subtracting or leaving the component will both be an issue (https://doi.org/10.1101/2024.06.06.597688). As such, even perfect identification of whether components are cardiac or not would still mean the issue remains (and this issue is also consistent across a considerable range of component based methods). Furthermore, current methods including wavelet transforms on the ICA component still do not provide good separation of the artifact and neural activity.

      This is definitely a fair point and we also highlight this in our recommendations under 3 stating that:

      “However, separating physiological from neural sources using an ICA is no guarantee that peripheral physiological activity is fully removed from the cortical signal. Even more sophisticated ICA based methods that e.g. apply wavelet transforms on the ICA components may still not provide a good separation of peripheral physiological and neural activity76,77. This turns the process of deciding whether or not an ICA component is e.g. either reflective of cardiac or neural activity into a challenging problem. For instance, when we only extract cardiac components using relatively high detection thresholds (e.g. r > 0.8), we might end up misclassifying residual cardiac activity as neural. In turn, we can’t always be sure that using lower thresholds won’t result in misinterpreting parts of the neural effects as cardiac. Both ways of analyzing the data can potentially result in misconceptions.”

      Castellanos, N. P., & Makarov, V. A. (2006). Recovering EEG brain signals: Artifact suppression with wavelet enhanced independent component analysis. Journal of neuroscience methods, 158(2), 300-312.

      Bailey, N. W., Hill, A. T., Godfrey, K., Perera, M. P. N., Rogasch, N. C., Fitzgibbon, B. M., & Fitzgerald, P. B. (2024). EEG is better when cleaning effectively targets artifacts. bioRxiv, 2024-06.

      METHODS:

      Pre-processing, page 24: I assume the symmetric setting of fastica was used (rather than the deflation setting), but this should be specified.

      Indeed the reviewer is correct, we used the standard setting of fastICA implemented in MNE python, which is calling the FastICA implementation in sklearn that is per default using the “parallel” or symmetric algorithm to compute an ICA. We added this information to the text accordingly, stating that:

      “For extracting physiological “artifacts” from the data, 50 independent components were calculated using the fastica algorithm(22) (implemented in MNE-Python version 1.2; with the parallel/symmetric setting; note: 50 components were selected for MEG for computational reasons for the analysis of EEG data no threshold was applied).”

      Temporal response functions, page 26: can the authors please clarify whether the TRF is computed against the ECG signal for each electrode or sensory independently, or if all electrodes/sensors are included in the analysis concurrently? I'm assuming it was computed for each electrode and sensory separately, since the TRF was computed in both the forward and backwards direction (perhaps the meaning of forwards and backwards could be explained in more detail also - i.e. using the ECG to predict the EEG signal, or using the EEG signal to predict the ECG signal?).

      A TRF can also be conceptualized as a multiple regression model over time lags. This means that we used all channels to compute the forward and backward models. In the case of the forward model we predicted the signal of the M/EEG channels in a multivariate regression model using the ECG electrode as predictor. In case of the backward model we predicted the ECG electrode based on the signal of all M/EEG channels. The forward model was used to depict the time window at which the ECG signal was encoded in the M/EEG recording, which appears at 0 time lags indicating volume conduction. The backward model was used to see how much information of the ECG was decodable by taking the information of all channels.

      We tried to further clarify this approach in the methods section stating that:

      “We calculated the same model in the forward direction (encoding model; i.e. predicting M/EEG data in a multivariate model from the ECG signal) and backward direction (decoding model; i.e. predicting the ECG signal using all M/EEG channels as predictors).”

      Page 27: the ECG data was fit using a knee, but it seems the EEG and MEG data was not.

      Does this different pose any potential confound to the conclusions drawn? (having said this, Figure S4 suggests perhaps a knee was tested in the M/EEG data, which should perhaps be explained in the text also).

      This was indeed tested in a previous review round to ensure that our results are not dependent on the presence/absence of a knee in the data. We therefore added figure S4, but forgot to actually add a description in the text. We are sorry for this oversight and added a paragraph to S1 accordingly:

      “Using FOOOF(5), we also investigated the impact of different slope fitting options (fixed vs. knee model fits) on the aperiodic age relationship (see Supplementary Figure S4). The results that we obtained from these analyses using FOOOF offer converging evidence with our main analysis using IRASA.”

      Page 32: my understanding of the result reported here is that cleaning with ICA provided better sensitivity to the effects of age on 1/f activity than cleaning with SSS. Is this accurate? I think this could also be reported in the main manuscript, as it will be useful to researchers considering how to clean their M/EEG data prior to analyzing 1/f activity.

      The reviewer is correct in stating that we overall detected slightly more “significant” effects, when not additionally cleaning the data using SSS. However, I am a bit wary of recommending omitting the use of SSS maxfilter solely based on this information. It can very well be that the higher quantity of effects (when not employing SSS maxfilter) stems from other physiological sources (e.g. muscle activity) that are correlated with age and removed when applying SSS maxfiltering. I think that just conditioning the decision of whether or not maxfilter is applied based on the amount or size of effects may not be the best idea. Instead I think that the applicability of maxfilter for research questions related to aperiodic activity should be the topic of additional methodological research. We therefore now write in Text S1:

      “Considering that we detected less and weaker aperiodic effects when using SSS maxfilter is it advisable to omit maxfilter, when analyzing aperiodic signals? We don’t think that we can make such a judgment based on our current results. This is because it's unclear whether or not the reduction of effects stems from an additional removal of peripheral information (e.g. muscle activity; that may be correlated with aging) or is induced by the SSS maxfiltering procedure itself. As the use of maxfilter in detecting changes of aperiodic activity was not subject of analysis that we are aware of, we suggest that this should be the topic of additional methodological research.”

      Page 39, Figure S6 and Figure S8: Perhaps the caption could also briefly explain the difference between maxfilter set to false vs true? I might have missed it, but I didn't gain an understanding of what varying maxfilter would mean.

      Figure S6 shows the effect of ageing on the spectral slope averaged across all channels. The maxfilter set to false in AB) means that no maxfiltering using SSS was performed vs. in CD) where the data was additionally processed using the SSS maxfilter algorithm. We now describe this more clearly by writing in the caption:

      “Supplementary Figure S6: Age-related changes in aperiodic brain activity are most prominent on explained by cardiac components irrespective of maxfiltering the data using signal space separation (SSS) or not AC) Age was used to predict the spectral slope (fitted at 0.1-145Hz) averaged across sensors at rest in three different conditions (ECG components not rejected [blue], ECG components rejected [orange], ECG components only [green].”

    1. Author response:

      The following is the authors’ response to the original reviews

      Public reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, Weber et al. investigate the role of 4 dopaminergic neurons of the Drosophila larva in mediating the association between an aversive high-salt stimulus and a neutral odor. The 4 DANs belong to the DL1 cluster and innervate non-overlapping compartments of the mushroom body, distinct from those involved in appetitive associative learning. Using specific driver lines, they show that activation of the DAN-g1 is sufficient to mimic an aversive memory and it is also necessary to form a high-salt memory of full strength, although optogenetic silencing of this neuron only partially affects the performance index. The authors use calcium imaging to show that the DAN-g1 is not the only one that responds to salt. DAN-c1 and d1 also respond to salt, but they seem to play no role in the assays tested. DAN-f1, which does not respond to salt, is able to lead to the formation of memory (if optogenetically activated), but it is not necessary for the salt-odor memory formation in normal conditions. However, silencing of DAN-f1 together with DAN-g1, enhances the memory deficit of DAN-g1.

      Strengths:

      The paper therefore reveals that also in the Drosophila larva as in the adult, rewards and punishments are processed by exclusive sets of DANs and that a complex interaction between a subset of DANs mediates salt-odor association.

      Overall, the manuscript contributes valuable results that are useful for understanding the organization and function of the dopaminergic system. The behavioral role of the specific DANs is accessed using specific driver lines which allow for testing of their function individually and in pairs. Moreover, the authors perform calcium imaging to test whether DANs are activated by salt, a prerequisite for inducing a negative association with it. Proper genetic controls are carried across the manuscript.

      Weaknesses:

      The authors use two different approaches to silence dopaminergic neurons: optogenetics and induction of apoptosis. The results are not always consistent, and the authors could improve the presentation and interpretation of the data. Specifically, optogenetics seems a better approach than apoptosis, which can affect the overall development of the system, but apoptosis experiments are used to set the grounds of the paper.

      The physiological data would suggest the role of a certain subset of DANs in salt-odor association, but a different partially overlapping set seems to be necessary. This should be better discussed and integrated into the author's conclusion. The EM data analysis reveals a non-trivial organization of sensory inputs into DANs and it is hard to extrapolate a link to the functional data presented in the paper.

      We would like to thank reviewer 1 for the positive evaluation of our work and for the critical suggestions for improvement. In the new version of the manuscript, we have centralized the optogenetic results and moved some of the ablation experiments to the Supplement. We also discuss in detail the experimental differences in the results. In addition, we have softened our interpretation of the specificity of memory for salt. As a result, we now emphasize more the general role of DANs for aversive learning in the larva. These changes are now also summarized and explained more simply and clearly in the Discussion, along with a revised discussion of the EM data.

      Reviewer #2 (Public Review):

      Summary:

      In this work, the authors show that dopaminergic neurons (DANs) from the DL1 cluster in Drosophila larvae are required for the formation of aversive memories. DL1 DANs complement pPAM cluster neurons which are required for the formation of attractive memories. This shows the compartmentalized network organization of how an insect learning center (the mushroom body) encodes memory by integrating olfactory stimuli with aversive or attractive teaching signals. Interestingly, the authors found that the 4 main dopaminergic DL1 neurons act redundantly, and that single-cell ablation did not result in aversive memory defects. However, ablation or silencing of a specific DL1 subset (DAN-f1,g1) resulted in reduced salt aversion learning, which was specific to salt but no other aversive teaching stimuli were tested. Importantly, activation of these DANs using an optogenetic approach was also sufficient to induce aversive learning in the presence of high salt. Together with the functional imaging of salt and fructose responses of the individual DANs and the implemented connectome analysis of sensory (and other) inputs to DL1/pPAM DANs, this represents a very comprehensive study linking the structural, functional, and behavioral role of DL1 DANs. This provides fundamental insight into the function of a simple yet efficiently organized learning center which displays highly conserved features of integrating teaching signals with other sensory cues via dopaminergic signaling.

      Strengths:

      This is a very careful, precise, and meticulous study identifying the main larval DANs involved in aversive learning using high salt as a teaching signal. This is highly interesting because it allows us to define the cellular substrates and pathways of aversive learning down to the single-cell level in a system without much redundancy. It therefore sets the basis to conduct even more sophisticated experiments and together with the neat connectome analysis opens the possibility of unraveling different sensory processing pathways within the DL1 cluster and integration with the higher-order circuit elements (Kenyon cells and MBONs). The authors' claims are well substantiated by the data and clearly discussed in the appropriate context. The authors also implement neat pathway analyses using the larval connectome data to its full advantage, thus providing network pathways that contribute towards explaining the obtained results.

      Weaknesses:

      While there is certainly room for further analysis in the future, the study is very complete as it stands. Suggestions for clarification are minor in nature.

      We would like to thank reviewer 2 for the positive evaluation of our work. In fact, follow-up work is already underway to further analyze the role of the individual DL1 DANs. We have addressed the constructive and detailed suggestions for improvement in our point-by-point responses in the “Recommendations for the authors” section.

      Reviewer #3 (Public Review):

      The study of Weber et al. provides a thorough investigation of the roles of four individual dopamine neurons for aversive associative learning in the Drosophila larva. They focus on the neurons of the DL-1 cluster which already have been shown to signal aversive teaching signals. However, the authors go far beyond the previous publications and test whether each of these dopamine neurons responds to salt or sugar, is necessary for learning about salt, bitter, or sugar, and is sufficient to induce a memory when optogenetically activated. In addition, previously published connectomic data is used to analyze the synaptic input to each of these dopamine neurons. The authors conclude that the aversive teaching signal induced by salt is distributed across the four DL-1 dopamine neurons, with two of them, DAN-f1 and DAN-g1, being particularly important. Overall, the experiments are well designed and performed, support the authors' conclusions, and deepen our understanding of the dopaminergic punishment system.

      Strengths:

      (1) This study provides, at least to my knowledge, the first in vivo imaging of larval dopamine neurons in response to tastants. Although the selection of tastants is limited, the results close an important gap in our understanding of the function of these neurons.

      (2) The authors performed a large number of experiments to probe for the necessity of each individual dopamine neuron, as well as combinations of neurons, for associative learning. This includes two different training regimens (1 or 3 trials), three different tastants (salt, quinine, and fructose) and two different effectors, one ablating the neuron, the other one acutely silencing it. This thorough work is highly commendable, and the results prove that it was worth it. The authors find that only one neuron, DAN-g1, is partially necessary for salt learning when acutely silenced, whereas a combination of two neurons, DAN-f1 and DAN-g1, are necessary for salt learning when either being ablated or silenced.

      (3) In addition, the authors probe whether any of the DL-1 neurons is sufficient for inducing an aversive memory. They found this to be the case for three of the neurons, largely confirming previous results obtained by a different learning paradigm, parameters, and effector.

      (4) This study also takes into account connectomic data to analyze the sensory input that each of the dopamine neurons receives. This analysis provides a welcome addition to previous studies and helps to gain a more complete understanding. The authors find large differences in inputs that each neuron receives, and little overlap in input that the dopamine neurons of the "aversive" DL-1 cluster and the "appetitive" pPAM cluster seem to receive.

      (5) Finally, the authors try to link all the gathered information in order to describe an updated working model of how aversive teaching signals are carried by dopamine neurons to the larva's memory center. This includes important comparisons both between two different aversive stimuli (salt and nociception) and between the larval and adult stages.

      Weaknesses:

      (1) The authors repeatedly claim that they found/proved salt-specific memories. I think this is problematic to some extent.

      (1a) With respect to the necessity of the DL-1 neurons for aversive memories, the authors' notion of salt-specificity relies on a significant reduction in salt memory after ablating DAN-f1 and g1, and the lack of such a reduction in quinine memory. However, Fig. 5K shows a quite suspicious trend of an impaired quinine memory which might have been significant with a higher sample size. I therefore think it is not fully clear yet whether DAN-f1 and DAN-g1 are really specifically necessary for salt learning, and the conclusions should be phrased carefully.

      (1b) With respect to the results of the optogenetic activation of DL-1 neurons, the authors conclude that specific salt memories were established because the aversive memories were observed in the presence of salt. However, this does not prove that the established memory is specific to salt - it could be an unspecific aversive memory that potentially could be observed in the presence of any other aversive stimuli. In the case of DAN-f1, the authors show that the neuron does not even get activated by salt, but is inhibited by sugar. Why should activation of such a neuron establish a specific salt memory? At the current state, the authors clearly showed that optogenetic activation of the neurons does induce aversive memories - the "content" of those memories, however, remains unknown.

      (2) In many figures (e.g. figures 4, 5, 6, supplementary figures S2, S3, S5), the same behavioural data of the effector control is plotted in several sub-figures. Were these experiments done in parallel? If not, the data should not be presented together with results not gathered in parallel. If yes, this should be clearly stated in the figure legends.

      We would also like to thank reviewer 3 for his positive assessment of our work. As already mentioned by reviewer 1, we understand the criticism that the salt specificity for which the individual DANs are coded is not fully always supported by the results of the work. We have therefore rewritten the relevant passages, which are also cited by the reviewer. We have also included the second point of criticism and incorporated it into our manuscript. As the control groups were always measured in parallel with the experimental animals, we can also present the data together in a sub-figure. We clearly state this now in the revised figure legends.

      Summary of recommendations to authors:

      Overall, the study is commendable for its systematic approach and solid methodology. Several weaknesses were identified, prompting the need for careful revisions of the manuscript:

      We thank the reviewers for the careful revision of our manuscript. In the subsequent sections, we aim to address their concerns as thoroughly as possible. A comprehensive one-to-one listing can be found below.

      (1) The authors should reconsider their assertion of uncovering a salt-specific memory, as the evidence does not conclusively demonstrate the exclusive necessity of DAN-f1 and DAN-g1 for salt learning. In particular, the optogenetic activation of DAN-f1 leads to plasticity but this might not be salt-specific. The precise nature of the memory content remains elusive, warranting a nuanced rephrasing of the conclusions.

      We only partially agree – optogenetic activation of DANs does not really allow to comment on its salt-specificity, true. However, we used high-salt concentrations during test. Over the years, the Gerber lab nicely demonstrated in several papers that larvae recall an aversive odor-salt memory only if salt is present during test (Gerber and Hendel, 2006; Niewalda et al 2008; Schleyer et al. 2011; Schleyer et al. 2015). The used US has to be present during test. Even at the same concentration other aversive stimuli (e.g. bitter quinine) are not able to allow the larvae to recall this particular type of memory. So, if the optogenetic activation of DAN-f1 establishes a memory that can be recalled on salt, we argue that it has to encode aspects of the salt information. On the other hand, only for DAN-g1 we see the necessity for salt learning. And – although (based on the current literature) very unlikely, we cannot fully exclude that the activation of DAN-f1 establishes a yet unknown type of memory that can be also recalled on a salt plate. Therefore, we partially agree and accordingly have rephrased the entire manuscript to avoid an over-interpretation of our data. Throughout the manuscript we avoid now to use the term salt-specific memory but rather describe the type of memory as aversive memory.

      (2) A thorough examination or discussion about the potential influence of blue light aversion on behavioral observations is necessary to ensure a balanced interpretation of the findings.

      To address this point every single behavioral experiment that uses optogenetic blue light activation runs with appropriate and mandatory controls. For blue light activation experiments, two genetic controls are used that either get the same blue light treatment (effector control, w1118>UAS-ChR2XXL) or no blue light treatment (dark control, XY-split-Gal4>UAS-ChR2XXL). For blue light inactivation experiments one group is added that has exactly the same genotype but did not receive food containing retinal. These experiments show that blue light exposure itself does not induce an aversive nor positive memory and blue light exposure does not impair the establishment of odor-high salt memory. In addition, we used the latest established transgenes available. ChR2<sup>XXL</sup> is very sensitive to blue light. Only 220 lux (60 µW/cm<sup>²</sup>) were necessary to obtain stable results. In our hands – short term exposure for up to 5 minutes with such low intensities does not induce a blue light aversion. Following the advice of the reviewer, we also address this concern by adding several sentences into the related results and methods sections.

      (3) The authors should address the limitations associated with the use of rpr/hid for neuronal ablations, such as the effects of potential developmental compensation.

      We agree with this concern. It is well possible that the ablation experiments induce compensatory effects during larval development. Such an effect may be the reason for differences in phenotypes when comparing hid,rpr ablation with optogenetic inhibition. This is now part of the discussion. In addition, we evaluated if the ablation worked in our experiments. So far controls were missing that show that the expression of hid,rpr really leads to the ablation of DANs. We now added these experiments and clearly show anatomically that the DANs are ablated (related to figure 4-figure supplement 6).

      (4) While the connectome analysis offers valuable insights into the observed functions of specific DANs in relation to their extrinsic (sensory) and intrinsic (state) inputs, integrating this data more cohesively within the manuscript through careful rewriting would enhance the coherence of the study.

      We understand this concern. Therefore, the new version of our manuscript is now intensifying the inclusion of the EM data in our interpretation of the results. Throughout the entire manuscript we have now rewritten the related parts. We have also completely revised the corresponding section in the results chapter.

      (5) More generally, the authors are encouraged to discuss internal discrepancies in the results of their functional manipulation experiments.

      Thank you for this suggestion. We do of course understand that we have not given the different results enough space in the discussion. We have now changed this and have been happy to comprehensively address the concern. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Here are some suggestions for clarification and improvement of the manuscript:

      (1) The authors should discuss why the silencing experiment with TH-GAL4 (Fig. 1) does not abolish memory formation (I assume that the PI should go to zero). Does it mean that other non-TH neurons are involved in salt-odor memory formation? Are there other lines that completely abolish this type of learning?

      Thank you very much for highlighting this crucial point. Indeed, the functional intervention does not completely eliminate the memory. There could be several reasons, or a combination thereof, for this outcome. For instance, it's plausible that the UAS-GtACR2 effector doesn't entirely suppress the activity of dopaminergic neurons. Additionally, the memory may comprise different types, not all of which are linked to dopamine function. It's also noteworthy that TH-Gal4 doesn't encompass all dopaminergic neurons – even a neuron from the DL1 cluster is absent (as previously reported in Selcho et al., 2009). Considering we're utilizing high salt concentrations in this experiment, it's conceivable that non gustatory-driven memories are formed based solely on the systemic effects of salt (e.g., increased osmotic pressure). These possibilities are now acknowledged in the text.

      (2) The Rpr experiments in Fig. 4 do not lead to any phenotype and there is a general assumption that the system compensates during development. However, there is no demonstration that Rpr worked or that development compensated for that. What do we learn from these data? Would it make sense to move it to supplement to make the story more compact? In addition: the conclusion at L 236 "DL1.... Are not individually necessary" is later disproved by optogenetic silencing. Similarly, optogenetic silencing of f1+g1 is affecting 1X and 3X learning, but not when using Rpr. Moreover, Rpr wdid not give any phenotype in other data in the supplementary material. I'm not sure how valid these results are.

      We acknowledge this concern and have actively deliberated various options for restructuring the presented ablation data. Ultimately, we reached a consensus that relocating Figure 4 to the supplement is warranted. Furthermore, corresponding adjustments have been made in the text. This decision amplifies the significance of the optogenetic results. In addition, we also addressed the other part of the concern. We examined the efficacy of hid and rpr in our experiments. Indeed, we successfully ablated specific DANs, as illustrated in the new anatomical data presented in Figure 4- figure supplement 6, which strengthens the interpretation of the hid,rpr experiments.

      (3) In most figures that show data for 1X and 3X training, there is no difference between these two conditions (I would suggest moving one set as a supplement). When a difference appears (Fig.5A-D) the implications are not discussed properly. Is it known that some circuits are necessary for the 1X but not for the 3X protocol? Is that a reasonable finding? I would expect the opposite, but I might lack of knowledge here. However, the optogenetic silencing of the same neurons in Figure 7 shows the same phenotype for 1X and 3X. Again, the validity of the Rpr experiments seems debatable.

      Different training protocols lead to different memory phases (STM and STM+ARM). We have shown that in the past in Widmann et al. 2016. Therefore, we are convinced that it makes sense to keep both data sets in the main manuscript. However, we agree that this was not properly introduced and discussed and therefore made the respective changes in the manuscript.

      (4) In Figure 3, it is unclear what the responses were tested against. Since they are so small and noisy there would be a need for a control. Moreover, in some cases, it looks like the DF/F is normalized to the wrong value: e.g. in DAN-c1 100mM, the activity in 0-10s is always above zero, and in pPAM with fructose is always below zero. This might not have any consequence on the results but should be adjusted.

      Thank you very much for your criticism, which we greatly appreciate. We have carefully re-examined the data and found that there was a mistake for the normalization of the values. We made the necessary adjustments to the evaluation, as per your suggestions. The updated figures, figure legends, and results have been incorporated into the new version of the manuscript. As noted by the reviewer, these corrections have not altered the interpretation of the data or the primary responses of the various DANs.

      (5) In the abstract: "Optogenetic activation of DAN-f1 and DAN-g1 alone suffices to substitute for salt punishment... Each DAN encodes a different aspect of salt punishment". These sentences might be misleading and an overstatement: only DAN-g1 shows a clear role, while the function of the other DANs in the context of salt-odor learning remains obscure.

      We have refined the respective part of the abstract accordingly. Consequently, we have reworded the related section, aiming to avoid any exaggeration.

      (6) The physiology is done in L1 larvae but behavior is tested in L3 larvae. There could be a change in this time that could explain the salt responses in c1 and d1 but no role in salt-odor learning?

      While we cannot dismiss the possibility of a developmental change from L1 to L3, a comparison of the anatomical data of the DL1 DANs from electron microscopy (EM) and light microscopy (LM) data indicates that their overall morphology remains consistent. However, it's important to note that this observation does not analyse the physiological aspects of these cells. Consequently, we have incorporated this concern into the discussion of the revised version of the manuscript.

      (7) The introduction needs some editing starting at L 129, as it ends with a discussion of a previously published EM data analysis. I would rather suggest stating which questions are addressed in this paper and which methods will be used and perhaps a hint on the results obtained.

      We understand the concern. We have added a concise paragraph to the conclusion of the introduction, highlighting the biological question, technical details, and a short hint on the acquired findings.

      (8) It is clear to me that the presentation of salt during the test is necessary for recall, however in L 166 I don't understand the explanation: how is the memory used in a beneficial way in the test? The salt is present everywhere and the odor cue is actually useless to escape it.

      Extensive research, exemplified by studies such as Schleyer et al. (2015) published in Elife, clearly demonstrates that the recall of odor-high salt memory occurs exclusively when tested on a high salt plate. Even when tested on a bitter quinine plate, the aversive memory is not recalled. This phenomenon is attributed to the triggering of motivation to recall the memory by the omnipresent abundance of the unconditioned stimulus (US) during the test, which in our case is high salt. Furthermore, the concentration of the stimulus plays a crucial role (Schleyer et al. 2011). The odor cue indicates where the situation could potentially be improved; however, if high salt is absent, this motivational drive diminishes as there is no memory present to enhance the already favorable situation. Additionally, the motivation to evade the omnipresent and unpleasant high salt stimulus persists throughout the entire 5-minute test period.

      (9) L288: the fact that f1 shows a phenotype in this experiment does not mean that it encodes a salt signal, indeed it does not respond to salt. It perhaps induces a plasticity that can be recalled by salt, but not necessarily linked to salt. The synergy between f1 and g1 in the salt assay was postulated based on exp with Rpr, but the validity of these experiments is dubious. I'm not sure there is sufficient evidence from Figures 6 and 7 to support a synergistic action between f1 and g1.

      It is true that DAN-f1 alone is not necessary for mediating a high salt teaching signal based on ablation, optogenetic inhibition and even physiology. However, optogenetic activation alone shows a memory tested on a salt plate. Given the logic explained above that is accepted by several publications, we would like to keep the statement. Especially as the joined activation with DAN-g1 gives rise to significant higher or lower values after joined optogenetic activation or inactivation (Figure 5E and F, Figure 6E and F in the new version). Nevertheless, we have modified the sentence. In the text we describe these effects now as “these results may suggest that DAN-f1 and DAN-g1 encode aspects of the natural aversive high salt teaching signal under the conditions that we tested”. We think that this is an appropriate and three-fold restricted statement. Therefore, we would like to keep it in this restricted version. However, we are happy to reconsider this if the reviewer thinks it is critical. 

      (10) I find the EM analysis hard to read. First of all, because of the two different graphical representations used in Fig. 8, wouldn't one be sufficient to make the point? Secondly, I could not grasp a take-home-message: what do we learn from the EM data? Do they explain any of the results? It seems to me that they don't provide an explanation of why some DL1 neurons respond to salt and others don't.

      We understand that the EM analysis is hard to read and have now carefully rewritten this part of the manuscript. See also general concern 4 above. The main take home message is not to explain why some DL1 neurons respond to salt and other do not. This cannot be resolved due to the missing information on the salt perceiving receptor cells. Unfortunately, we miss the peripheral nervous system in the EM - the first layer of salt information processing. However, our analysis shows clearly that the 4 DANs have their own identity based on their connectivity. None of them is the same – but to a certain extent similarities exist. This nicely reflects the physiological and behavioral results. We have now clarified that in the result to ease the understanding for the readership. In addition, we also clearly state that we don’t address the point why some DL1 neurons respond to salt and why others don’t respond.

      (11) Do the manipulations (activation and silencing) affect odor preference in the presence of salt? Did the authors test that the two odors do not drive different behaviors on the salty plate? Or did they only test the odor preference on plain agarose? Can we exclude a role for the DAN in driving multisensory-driven innate behavior?

      Innate odor preferences are not changed by the presence of salt or even other tastants (this work but see also Schleyer et al 2015, Figure 3, Elife). Even the naïve choice between two odors is the same if tested in the presence of different tastants (Schleyer et al 2015, Figure 3, Elife). This shows – at least for the tested stimuli and conditions – that are similar to the ones that we use – that there is no multisensory-driven innate odor-taste behavior. Therefore – at least to our knowledge - experiments as the ones suggested by the reviewer were never done in larval odor-taste learning studies. Therefore, we suggest that DAN activation has no effect on innate larval behavior. However, we are happy to reconsider this if the reviewer thinks it is critical. 

      (12) L 280: the authors generalize the conclusion to all DL1-DANs, but it does not apply to c1 and d1.

      Thanks for this comment. We deleted that sentence as suggested and thus do not anymore generalize the conclusion to all DL-DANs.

      (13) L345: I do not see the described differences in Fig. 8F, presynaptic sites of both types seem to appear in rather broad regions: could the author try to clarify this?

      We understand that the anatomical description of the data is often hard to read. Especially to readers that are not used to these kind of figures. We have therefore modified the text to ease the understanding and clarify the difference in the labeled brain regions for the broad readership.

      (14) L373: the conclusion on c1 is unsupported by data: this neuron responds to both salt and fructose (Figure 3 ) while the conclusion is purely based on EM data analysis.

      The sentence is not a conclusion but a speculation and we also list the cell's response to positive and negative gustatory stimuli. Therefore, we do not understand exactly what the reviewer means here. However, we have tried to address the criticism and have revised the sentences.

      (15) L385: the data on d1 seem to be inconsistent with Eschbach 2020, but the authors do not discuss if this is due to the differential vs absolute training, or perhaps the presence of the US during the test (which does not seem to be there in Eschbach, 2020) - is the training protocol really responsible for this inconsistency? For f1 the data seem to be consistent across these studies. The authors should clarify how the exp in Fig 6 differs from Eschbach, 2020 and how one could interpret the differences.

      True. This concern is correct. We now discuss the difference in more detail. Eschbach et al. used Cs-Crimson as a genetic tool, a one odor paradigm with 3 training cycles, and no gustatory cues in their approach. These differences are now discussed in the new version of the manuscript.

      (16) L460-475 A long part of this paragraph discusses the similarities between c1 and d1 and corresponding PPL1 neurons in the adult fly. However, c1 and d1 do not really show any phenotype in this paper, I'm not sure what we learn from this discussion and how much this paper can contribute to it. I would have wished for a discussion of how one could possibly reconcile the observed inconsistencies.

      Based on the comments of the different reviewers several paragraphs in the discussion were modified. We agree that the part on the larval-adult comparison is quite long. Thus we have shortened it as suggested by the reviewer.

      Minor corrections:

      L28 "resultant association" maybe resulting instead.

      L55 "animals derive benefit": remove derive.

      L78 "composing 12,000 neurons": composed of.

      L79 what is stable in a "stable behavioral assay"?

      L104: 2 times cluste.

      L122: "DL1 DANs are involved" in what?

      Fig. 1 please check subpanels labels, D repeats.

      L 362: "But how do individual neurons contribute to the teaching signal of the complete cluster?" I don't understand the question.

      L364 I did not hear before about the "labeled line hypothesis" in this context - could the author clarify?

      L368: edit "combinatorically".

      L390: "current suppression" maybe acute suppression.

      L 400 I'm not sure what is meant by "judicious functional configuration" and "redundancy". The functions of these cells are not redundant, and no straightforward prediction of their function can be done from their physiological response to salt.

      Thanks a lot for your in detail review of our manuscript. We welcome your well-taken concerns and have made the requested changes for all points that you have raised.

      Reviewer #2 (Recommendations For The Authors):

      (1) In Figure 1 the reconstruction of pPAM and DL1 DANs shows the compartmentalized innervation of the larval MB. However, the images are a bit low in color contrast to appreciate the innervation well. In particular in panel B, it is hard to identify the innervated MB body structure. A schematic model of the larval MB and DAN innervation domains like in Fig. 2A would help to clarify the innervation pattern to the non-specialist.

      We understand this concern and have changed figure 1 as suggested by the reviewer. A schematic model of the MB and DANs is now presented already in figure 1 as well as the according supplemental figure.

      (2) Blue light itself can be aversive for larvae and thus interfere with the aversive learning paradigm. Does the given Illuminance (220 lux) used in these experiments affect the behavior and learning outcome?

      Yes, in former times high intensities of blue light were necessary to trigger the first generation optogenetic tools. The high intensity blue light itself was able to establish an aversive memory (e.g. Rohwedder et al. 2016). Usage of the second generation optogenetic tools allowed us to strongly reduce the applied light intensity. Now we use 220 lux (equal to 60 µW/cm<sup>2</sup>). Please note that all Gal4 and UAS controls in the manuscript are nonsignificant different from zero. The mild blue light stimulation therefore does not serve as a teaching signal and has neither an aversive nor an appetitive effect. Furthermore, we use this mild light intensity for several other behavioral paradigms (locomotion, feeding, naïve preferences) and have never seen an effect on the behavior.

      (3) Fig.2: Except for MB054B-Gal4 only the MB expression pattern is shown for other lines. Is there any additional expression in other cells of the brain? In the legend in line 761, the reporter does not show endogenous expression, rather it is a fluorescent reporter signal labeling the mushroom body.

      The lines were initially identified by a screen on larval MB neurons done together with Jim Truman, Marta Zlatic and Bertram Gerber. Here full brain scans were always analyzed. These images can be seen in Eschbach et al. 2020, extended figure 1. Neither in their evaluation nor in our anatomical evaluation (using a different protocol) additional expression in brain cells was detectable. We also modified the figure legend as suggested.

      (4) Fig.3: Precise n numbers per experiment should be stated in the figure legend.

      True, we now present n numbers per experiment whenever necessary.

      (5) Fig.4: Have the authors confirmed complete ablation of the targeted neuron using rpr/hid? Ablations can be highly incomplete depending on the onset and strength of Gal4 expression, leaving some functionality intact. While the ablation experiments are largely in line with the acute silencing of single DANs during high salt learning performed later on (Fig.7), there is potentially an interesting aspect of developmental compensation hidden in this data. Not a major point, but potentially interesting to check.

      We agree with this criticism. We have not tested if the expression of hid,rpr in DL1 DANs does really ablate them. Therefore we did an additional experiment to show that. The new data is now present as a supplemental figure (Figure 4- figure supplement 6). The result shows that expression of hid,rpr ablates also DL1 DANs similar to earlier experiments where we used the same effectors to ablate serotoniergic neurons (Huser et al., 2012, figure 5).

      (6) The performance index in Fig. 4 and 5 sometimes seems lower and the variability is higher than in some of the other experiments shown. Is this due to the high intrinsic variability of these particular experiments, or the background effects of the rpr/hid or splitGal4 lines?

      The general variability of these experiments is within the expected and known borders. In these kind of experiments there is always some variation due to several external factors (e.g. experimental time over the year). Therefore it is always important to measure controls and experimental animals at the same time. Of course that’s what we did and we only compare directly results of individual datasets. But not between different datasets. This is further hampered given that the experiments of Figure 4 (now Figure 4- figure supplement 1) and Figure 5 (now Figure 4) differ in several parameters from other learning experiments presented later in the text. Optogenetic activation uses blue light stimulation instead of “real world” high salt. Most often direct activation of specific DANs in the brain is more stable than the external high salt stimulation. Also optogenetic inactivation uses blue light stimulation and also retinal supplemented food. Both factors can affect the measurement. We thus want to argue that it is for each experiment most often the particular parameters that affect the variability of the results rather than background effects of the rpr/hid and split-Gal4 lines.

      (7) Fig.7: This is a neat experiment showing the effects of acute silencing of individual DL1 DANs. As silencing DAN-f1/g1 does not result in complete suppression of aversive learning, it would be highly interesting to test (or speculate about) additive or modulatory effects by the other DANs. Dan-c-1/d-1 also responds to high salt but does not show function on its own in these assays. I am aware that this is currently genetically not feasible. It would however be a nice future experiment.

      True, we were intensively screening for DL1 cluster specific driver lines that cover all 4 DL1 neurons or other combinations than the ones we tested. Unfortunately, we did not succeed in identifying them. Nevertheless, we will further screen new genetic resources (e.g. Meissner et al., 2024, bioRxiv) to expand our approach in future experiments. Please also see our comment on concern 1 of reviewer 1 for further technical limitations and biological questions that can also potentially explain the absence of complete suppression of high salt learning and memory. Some of these limitations are now also mentioned and discussed in the new version of the manuscript.

      (8) The discussion is excellent. I would just amend that it is likely that larval DAN-c1, which has high interconnectivity within the larval CNS, is likely integrating state-dependent network changes, similar to the role of some DANs in innate and state-dependent preference behavior. This might contribute to modulating learned behavior depending on the present (acute) and previous environmental conditions.

      Thanks a lot for bringing this up. We rewrote this part and added a discussion on recent work on DAN-c1 function in larvae as well as results on DAN function in innate and state-dependent preference behavior.

      (9) Citation in line 1115 missing access information: "Schnitzer M, Huang C, Luo J, Je Woo S, Roitman L, et al. 2023. Dopamine signals integrate innate and learned valences to regulate memory dynamics. Research Square".

      Unfortunately this escaped our notice. The paper is now published in Nature: Huang, C., Luo, J., Woo, S.J. et al. Dopamine-mediated interactions between short- and long-term memory dynamics. Nature 634, 1141–1149 (2024). https://doi.org/10.1038/s41586-024-07819-w. We have now changed the citation. The new citation includes the missing access information.

      Reviewer #3 (Recommendations For The Authors):

      Regarding my issue about salt specificity in the public review, I want to make clear that I do not suggest additional experiments, but to be very careful in phrasing the conclusions, in particular whenever referring to the experiments with optogenetic activation. This includes presenting these experiments as "(salt) substitution" experiments - inferring that the optogenetic activation would substitute for a natural salt punishment. As important and interesting as the experiments are, they simply do not allow such an interpretation at this point.

      Results, line 140ff: When presenting the results regarding TH-Gal4 crossed to ChR2-XXL, please cite Schroll et al. 2006 who demonstrated the same results for the first time.

      Thanks for mentioning this. We now cite Schroll et al. 2006 here in the text of the manuscript.

      Figure 3: The subfigure labels (ABC) are missing.

      Unfortunately this escaped our notice. Thanks a lot – we have now corrected this mistake.

      Figure 5: For I and L, it reads "salt replaced with fru", but the sketch on the left shows salt in the test. I assume that fructose was not actually present in the test, and therefore the figure can be misleading. I suggest separate sketches. Also, I and L are not mentioned in the figure legend.

      True, this is rather confusing. Based on the well taken concern we have changed the figure by adding a new and correct scheme for sugar reward learning that does not symbolize fructose during test.

      Figure S1: The experimental sketches for E,F and G,H seem to be mixed up.

      We thank the reviewer for bringing this up. In the new version we corrected this mistake.

      Figure S5: There are three sub-figures labelled with B. Please correct.

      Again, thanks a lot. We made the suggested correction in Figure S5.

      Discussion, line 353ff: this and the following sentences can be read as if the authors have discovered the DL-1 neurons as aversive teaching mediators in this study. However, Eschbach et al. 2020 already demonstrated very similar results regarding the optogenetic activation of single DL-1 DANs. I suggest to rephrase and cite Eschbach et al. 2020 at this point.

      That is correct. Our focus was on the gustatory pathway. The original discovery was made by Eschbach et al. We have now corrected this in the discussion and clarified our contribution. It was never our intention to hide this work, as the laboratory was also involved. Nevertheless, this is an annoying omission on our side.

      Line 385-387: this sentence is only correct with respect to Eschbach et al. 2020. Weiglein et al. 2021 used ChR2-XXL as an effector, but another training regimen.

      We understand this criticism. Therefore, we changed the sentence as suggested by the reviewer. See also our response on concern 15 of reviewer 1.

      Line 389ff: I do not understand this sentence. What is meant by persistent and current suppression of activity? If this refers to the behavioural experiments, it is misleading as in the hid, reaper experiments neurons are ablated and not suppressed in activity.

      We made the requested changes in the text. It is true that the ablation of a neuron throughout larval life is different from constantly blocking the output of a persisting neuron.

      Methods, line 615 ff: the performance index is said to be calculated as the difference between the two preferences, but the equation shows the average of the preferences.

      Thanks a lot. We are sorry for the confusion. We have carefully rewritten this part of the methods section to avoid any misunderstanding.

      When discussing the organization of the DL1 cluster, on several occasions I have the impression the authors use the terms "redundant" and "combinatorial" synonymously. I suggest to be more careful here. Redundancy implies that each DAN in principle can "do the job", whereas combinatorial coding implies that only a combination of DANs together can "do the job". If "the job" is establishing an aversive salt memory, the authors' results point to redundancy: no experimental manipulation totally abolished salt learning, implying that the non-manipulated neurons in each experiment sufficed to establish a memory; and several DANs, when individually activated, can establish an aversive memory, implying that each of them indeed can "do the job".

      Based on this concern we have rewritten the discussion as suggested to be more precise when talking about redundancy or combinatorial coding of the aversive teaching signal. Basically, we have removed all the combinatorial terms and replaced them by the term “redundancy”.

      The authors mix parametric and non-parametric statistical tests across the experiments dependent on whether the distribution of the data is normal or not. It would help readers if the authors would clearly state for which data which tests were used.

      We understand the criticism and now have added an additional supplemental file that includes all the information on the statistical tests applied and the distribution of the data.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      This study experimentally examined diet-microbe-host interactions through a complex systems framework, centered on dietary oxalate. Multiple, independent molecular, animal, and in vitro experimental models were introduced into this research. The authors found that microbiome composition influenced multiple oxalate-microbe-host interfaces. Oxalobacter formigenes were only effective against a poor oxalate-degrading microbiota background and give critical new insights into why clinical intervention trials with this species exhibit variable outcomes. Data suggest that, while heterogeneity in the microbiome impacts multiple diet-host-microbe interfaces, metabolic redundancy among diverse microorganisms in specific diet-microbe axes is a critical variable that may impact the efficacy of bacteriotherapies, which can help guide patient and probiotic selection criteria in probiotic clinical trials.

      Thank you. The main message of this research, is that through complex modelling, we believe we have identified the critical variable (metabolic redundancy) that is responsible for the efficacy of probiotics designed to reduce oxalate levels, thus allowing for improved patient selection in clinical trials. We also believe that this process and the critical features identified can be translated to other critical microbial functions such as short chain fatty acid synthesis, secondary bile acid synthesis, and others.

      Strengths:

      The paper has made significant progress in both the depth and breadth of scientific research by systematically comparing multiple experimental methods across multiple dimensions. Particularly through in-depth analysis from the enzymatic perspective, it has not only successfully identified several key strains and redundant genes, which is of great significance for understanding the functions of enzymes, the characteristics of strains, and the mechanisms of genes in microbial communities, but also provided a valuable reference for subsequent experimental design and theoretical research.

      More importantly, the establishment of a novel research approach to probiotics and gut microbiota in this paper represents a major contribution to the current research field. The proposal of this new approach not only breaks through the limitations of traditional research but also offers new perspectives and strategies for the screening, optimization of probiotics, and the regulation of gut microbiota balance. This holds potential significant value for improving human health and the prevention and treatment of related diseases.

      Thank you for the comments. We believe that the approach taken here, which contrasts with conventional reductionist techniques, will be critical for translating gut microbiome research into actionable therapeutic approaches.

      Weaknesses:

      While the study has excellently examined the overall changes in microbial community structure and the functions of individual bacteria, it lacks a focused investigation on the metabolic cross-feeding relationships between oxalate-degrading bacteria and related microorganisms, failing to provide a foundational microbial community or model for future research. Although this paper conducts a detailed study on oxalate metabolism, it would be beneficial to visually present the enrichment of different microbial community structures in metabolic pathways using graphical models.

      Thank you for this critique.  In the current study, we broadly examined the response of the gut microbiota to dietary oxalate. Based on initial shotgun metagenomic results, we focused in on specific taxa and metabolic functions.  Through metagenomic and multiple culture-based studies, we quickly honed in on redundancy in oxalate-degrading function as a key feature for oxalate homeostasis. We believe that the defined microbial community we used for microbial transplants (particularly the taxonomic cohort) provides a strong, minimal community to explore oxalate homeostasis further. In fact, we are using this consortium in multiple follow-up studies to fully understand the cross-feeding that may occur among these microorganisms, as you suggest.  We note that figure 3 shows the change of species and metabolic pathways with oxalate exposure.   

      Furthermore, the authors have done a commendable job in studying the roles of key bacteria. If the interactions and effects of upstream and downstream metabolically related bacteria could be integrated, it would provide readers with even more meaningful information. By illustrating how these bacteria interact within the metabolic network, readers can gain a deeper understanding of the complex ecological and functional relationships within microbial communities. Such an integrated approach would not only enhance the scientific value of the study but also facilitate future research in this area.

      Thank you. We note that based on the collective data obtained in this study, that redundancy in the oxalate degradation is the critical feature that maintains oxalate homeostasis. However, we are interested potential metabolic interactions between microbes in our defined community and are currently investigating these interactions through extensive investigations.   

      Reviewer #2 (Public review):

      Summary:

      Using the well-studied oxalate-microbiome-host system, the authors propose a novel conceptual and experimental framework for developing targeted bacteriotherapies using a three-phase pre-clinical workflow. The third phase is based on a 'complex system theoretical approach' in which multi-omics technologies are combined in independent in vivo and in vitro models to successfully identify the most pertinent variables that influence specific phenotypes in diet-host-microbe systems. The innovation relies on the third phase since phase I and phase II are the dominant approaches everyone in the microbiome field uses.

      Thank you. As you note, the proposed phases I and II are the predominant approaches used. In fact, many clinical trials have been conducted to try and reduce urine oxalate in patients, based solely on mechanistic studies with Oxalobacter formigenes.  As noted in our manuscript, only 43% of those studies results in the intended outcome, necessitating the approach we took in the current study. Our results suggest that the reason for the high rate of failure, despite well established mechanisms, is due to insufficient patient selection that focused only on the presence or absence of O. formigenes, which is a species that exhibits very low prevalence and abundance in the human gut microbiota, normally.

      Strengths:

      The authors used a multidisciplinary approach which included:

      (1) fecal transplant of two distinct microbial communities into Swiss-Webster mice (SWM) to characterize the host response (hepatic response-transcriptomics) and microbial activity (untargeted metabolomics of the stool samples) to different oxalate concentrations;

      (2) longitudinal analysis of the N. albigulia gut microbiome composition in response to varying concentrations of oxalate by shotgun metagenomics, with deep bioinformatic analyses of the genomes assembled; and

      (3) development of synthetic microbial communities around oxalate metabolisms and evaluation of these communities' activity in oxalate degradation in vivo.

      Thank you for these comments.  In the complex modelling approach, we focused on complete microbiota from host species known to have high and low capacities for oxalate tolerance, combined with targeting specific metabolic functions vs. specific taxa that may include unknown functions important for oxalate metabolism.  Further, we examined the influence of our target communities on oxalate metabolism through multiple in vitro and in vivo studies.

      Weaknesses:

      However, I have concerns about the frame the authors tried to provide for a 'complex system theoretical approach' and how the data are interpreted within this frame. Several of the conclusions the authors provide do not seem to have sufficient data to support them.

      Thank you.  We have tried to address these concerns by adding an exhaustive figure that broadly represents our complex modelling approach that includes potential complex system-based hypotheses, how they were tested, and the host-microbiome-oxalate interactions found in our study.

      Recommendations for the authors:  

      Reviewer #2 (Recommendations for the authors):

      Major Concerns

      (1) The authors argue about the importance of bringing 'Complex System Theory' to the microbiome field systematically and consistently. However, the authors fail to introduce this theory throughout the entire manuscript. For example, the authors tried to describe key elements and their nomenclature, such as nodes and fractal layers, in the first part of the result section. But the description is wordy and not precise. It would be more useful if the authors connected the model description with a visual representation, such as a figure. Unfortunately, these elements are not emphasizing and carried across the results section and are not mentioned in the discussion section.

      We have now added a figure (Figure 7) that details this process extensively and ties each of our findings to the complex system model and nomenclature.  We have also reiterated how our results fit in the complex system model in the discussion.

      In addition, there is no straightforward approach to integrating multi-omics datasets to identify the variables that are determinants of the system. For example, Figure 1 focuses on the impact of the host, hepatic activity, to oxalate exposure on fecal transplants into Swiss Webster mice; Figure 2 focuses on the effects of oxalate exposure on stool metabolic activity, not only microbial metabolic activity, on fecal transplants into Swiss Webster mice; and Figure 3 focuses on microbiome responses to different oxalate concentration in Neotoma albigula. There is no "model" to really integrate the host, the microbiome activity, and the microbiome composition information. And, unfortunately, the data generated between experiments cannot directly integrate; see major concern # 2.

      Thank you.  We have made more clear the experimental approach and how it applied to understanding the critical factors that maintain oxalate homeostasis.  Specifically, Figure 1 established that the effect of oxalate on the host was dependent on the microbiota, rather than host genetics.  Figure 2 established the effect of oxalate on the gut microbiota was again dependent on the whole gut microbiota and that these oxalate-microbe effects also influenced oxalate-host effects through a direct multi-omic data integration.  Once we established that the oxalate effects on host and microbiota were dependent on the whole microbiota composition, Figure 3 then sought to figure out how oxalate impacted the gut microbiota, using our model of high oxalate tolerance (N. albigula). With the finding in Figure 3 that there were multiple genes attributed to the degradation of oxalate, or acetogenic, methanogenic, and sulfate reducing pathways, Figure 4 and relevant supplemental figures sought to quantify the redundancy of these pathways.  After establishing a very high degree of redundancy, we sought to use a culturomic approach to determine what environmental factors impacted oxalate metabolism and to evaluate oxalate metabolism using our defined, hypothesized communities of microorganisms.  Finally, figure 6 sought to validate our metagenomic, metabolomic, and culturomic results from multiple animal and in vitro models using targeted microbial transplants in mice.  While we did have some direct multi-omic data integration (Figures 2 and 3), the process employed here sought to systematically determine which factors were most important for the oxalate-microbiota-host relationship, and then to use those results to design the subsequent experiments.  We have added this description to the discussion, which helps to contextualize the complex system modelling approach we took here.

      Finally, the authors did not provide a novel variable that successfully influences oxalate degradation in the oxalate-microbiome-host system. The authors argue that "both resource availability and community composition impact oxalate metabolism," which we currently inferred by the failure of the clinical tries and do not provide a clear intervention strategy to develop functional bacteriotherapy. The identification of composition as an important variable that was predictable without any multi-omics approach was highlighted by the development of synthetic microbial communities. Synthetic microbial communities are critical to characterizing complex microbiomes. Still, the authors did not explain how this strategy can be used in their theoretical framework (that is their goal), and these communities are not well introduced across the manuscript; see major concern # 4.

      As stated, it is clear from the failed clinical trials that we do not fully understand what microbial features dictate oxalate homeostasis.  We have specifically identified, through fecal transplant studies, that microbial composition is critical for oxalate homeostasis and that diverse oxalate-degrading bacteria exist.  However, ours is the first study that explicitly shows that it is this diversity that controls oxalate homeostasis.  This is specifically ascertained through the targeted microbial transplants in mice whereby O. formigenes was given alone or with different combinations of other microorganisms.  In other words, we were able to replicate both successful and failed studies by manipulating which specific species were introduced into animals.  This is unprecedented in the literature.

      (2) The authors provide several conclusions that are not completely supported by the data available. For example:

      (a) Lines 236-239: "Within the framework of complex systems, results show microbe-host cooperation whereby oxalate effectively processed within the SW-NALB gut microbiota reduced overall liver activity, indicative of a beneficial impact." - The authors did not provide data related to oxalate levels of oxalate processing for this dataset.

      While we did not specifically quantify oxalate degradation for this specific study, as cited in the text when describing this Swiss-Webster, Neotoma albigula system, we have previously published multiple animal studies explicitly showing that the N. albigula animals were highly effective oxalate degraders, which is transferable to Swiss-Webster mice through fecal transplants. Since the gut microbiota’s impact on oxalate has been welll established through experiments by our group, the purpose of these specific experiments were to look the other way and examine the effect of oxalate on the gut microbiota of these two animal models.  In the referenced text, we again cited our studies showing that the SW-NALB system effectively degrades oxalate.

      (b) Lines 239-243: "Data also suggest that both the gut microbiota and the immune system are involved in oxalate remediation (redundancy), such that if oxalate cannot be neutralized in the gut microbiota or liver, then the molecule will be processed through host immune response mechanisms (fractality), in this case indicated through an overall increase in hepatic activity and specifically in mitochondrial activity." - The authors did not provide any evidence related to the immune system and oxalate metabolism.

      We corrected that statement as follows: “…in this case indicated through an overall increase in inflammatory cytokines with oxalate exposure combined with an ineffective oxalate-degrading microbiota (Figures S6a,b; S9a,b).”  In other words, if the liver and gut microbiota can’t eliminate a toxin, then the immune system must deal with it through inflammatory pathways.  Oxalate is a well established, pro-inflammatory compound.  Our data show that this is dependent on the gut microbiota.

      (c) Lines 250-252: "Following the diet trial, colon stool was collected post-necropsy and processed for untargeted metabolomics, which is a measure of total microbial metabolic output." - Although most metabolites in stool samples are indeed microbial, there are also host metabolites. So, it is not technically correct to relate the metabolomic analysis of stool samples to only microbial metabolic analysis. In addition, the authors discussed compounds such as alkaloids and cholesterol as microbial metabolites, which these compounds are more related to the diet and host correspondingly.

      We have corrected this to state: “total metabolites present in stool from the diet, microbial activity, and host activity”

      (d) Lines 270-273. "Specifically, the SW-NALB mice exhibit hallmarks of homeostatic feedback with oxalate exposure to maintain a consistent metabolic output, defined by the relatively small, net negative, microbial metabolite-hepatic gene network compared to the large, net positive, network of SW-SW mice." - How do the authors define oxalate homeostasis? In addition, do the authors imply feedback between the liver and the microbiome in which the microbiome responds to a liver response related to oxalate levels? Or could the observation in Figure 1 be explained just by microbial consumption of oxalate that would reduce the impact of oxalate that arrives at the liver?

      Oxalate homeostasis is defined in that sentence: “relatively small, net negative, microbial metabolite-hepatic gene network compared to the large, net positive, network of SW-SW mice” – in other words, for SW-NALB mice, oxalate did not produce a considerable change to either microbial or hepatic metabolic activity.  We did not really test the liver impact on gut microbiota and can’t speak to that.  We believe, based on Figure 2 data, that it is not just the degradation of oxalate that explains the lack of change in hepatic activity in SW-NALB mice, rather that the oxalate-induced shift in the gut microbiota metabolic activity broadly altered hepatic activity, as inferred from Figure 2 c.  We made this more clear in the results: “suggests that the oxalate-induced change in microbial metabolism is responsible for the change in hepatic activity”.

      (e) Lines 297-301: "The oxalate-dependent metagenomic divergence of the NALB gut microbiota (Figure 3), combined with the lack of change in the microbial metabolomic profile with oxalate exposure (Figure 2), suggest that oxalate stimulates taxonomically diverse, but metabolically redundant microorganisms, in support of maintaining homeostasis." - The authors cannot conclude anything related between taxonomic changes and microbial activity since the taxonomic data presented is for microbial enrichment in N. albigulia, and the "microbial activity data" is from the fecal transplantation experiment in SWM. These are two completely different systems with two completely different experimental designs.

      We have shown very similar results in that oxalate induces the taxonomic divergence for the NALB gut microbiota, in multiple previous studies.  The experiment in which a minimal, positive increase in microbial metabolites, was saw with oxalate was based on the SW-NALB model whereby Swiss-Webster mice have an NALB microbiota.  We show throughout the manuscript, that the impact of oxalate is very microbiota dependent and supports our claim.  However, the claim is hypothesis generating – that metabolic redundancy is important for oxalate homeostasis.  We modified our statement to make all of this more clear.   

      Related to microbial composition, the authors did not show data validating the efficiency of the fecal transplantations (allograft or xenograft) in the SWM after antibiotic treatment. They also did not show evidence of microbial composition dynamics in response to oxalate exposure.

      Again, the efficacy of fecal transplants, used in the way they were here, has been shown in multiple past studies of our group.  In past studies, we have extensively characterized the microbiota from fecal transplants and which taxa were associated with oxalate levels.  Therefore, that topic was not the focus of the current study, instead focusing on the oxalate impact on gut microbiota activity.  Our past studies, referenced multiple times through the current manuscript, were used in large part to help determine which microbes to include in our taxonomic cohort, as described in the manuscript.

      (f) Lines 301-303: "Given that data came from the same hosts sampled longitudinally, these data also reflect a microbiota that is adaptive to oxalate exposure, which is another important characteristic of complex systems." - In their dataset, what is the evidence that the microbiota of N. albigulia is adapted to oxalate exposure? Is the increase in genomes with pathways related to oxalate metabolism related to an increase of oxalate in the diet? If so, does the microbiota exposure with a higher oxalate concentration decrease the systemic level of oxalate? In neither of the experiments related to Figures 1 to 3, the authors showed a correlation of systemic oxalate levels with microbial composition, hepatic host response, or stool metabolism.

      Figure 3 explicitly shows the longitudinal impact of increasing levels of oxalate showing an increase in oxalate degrading genes (Figure 3d). The specific samples selected for analysis here come from a previous study in which we explicitly quantified changes to the gut microbiota composition and both stool and urine oxalate for every time point listed in figure 3a.  This information is explicitly stated in the methods coupled with the fact that “neither fecal nor urinary oxalate levels increased significantly.”  Again, the effect of the gut microbiota on oxalate in these model systems have been extensively studied by our group and provide the foundation for the current study to look at the effect of oxalate on the gut microbiota and host.

      Considering my last two points, the authors do not present substantial evidence to support their hypothesis that oxalate stimulates taxonomically diverse, metabolically redundant communities.

      As stated above, that oxalate stimulates taxonomically diverse taxa was ascertained through multiple past studies, as well as the current study (Figure 3e).  The metabolically redundant part is ascertained both through untargeted metabolomics (Figure 2a,b) and shotgun metagenomics (Figure 3c,d).  Further evidence for the metabolic redundancy with oxalate comes from our culturomic approach, which showed that 14.58% of isolates could grow on oxalate as a carbon and energy source, in addition to the high proportion of isolates that could grow on other carbon and energy sources, at least much more than can be ascribed to a single species  (Figure 5c).  We made this more clear in the discussion.

      (g) Lines 330-335. "Additionally, the broad diversity of species that contain oxalate-related genes suggests that the distribution of metabolic genes is somewhat independent of the distribution of microbial species, which suggests that microbial genes exist in an autonomous fractal layer, to some degree. This hypothesis is supported by studies which show a high degree of horizontal gene transfer within the gut microbiota as a means of adaptation." - This conclusion is highly speculative, especially since the author did not do any analysis to directly evaluate a relationship between the oxalate metabolic pathways and the microbial species where these pathways are present.

      Figure 3c,d,e explicitly shows the metabolic pathways and species enriched by oxalate exposure.  Figure 4d, generated using the same data from Figure 3, explicitly shows the taxa that harbor oxalate-degrading genes.   

      (h) Lines 364-366. "Collectively, data show that both resource availability and community composition impacts oxalate metabolism, which helps to define the adaptive nature of the NALB gut microbiota." - The authors indeed showed evidence that community composition impacts oxalate metabolism. However, the authors did not show any evidence to directly evaluate the resource availability to impact oxalate metabolism.

      This is explicitly shown through in vitro community-based and single species assays varying multiple different carbon and energy sources to quantify changes to oxalate degradation (chosen based on shotgun metagenomic results; Figure 5a,b).

      (3) Lines 321-325. "Acetogenic genes were also present in 97.18% of genomes, dominated by acetate kinase and formate-tetrahydrofolate ligase (Figure S3A323C). Methanogenic genes were present in 100% of genomes, dominated by phosphoserine phosphatase, atpdependent 6-phosphofructokinase, and phosphate acetyltransferase (Figure S4A-C)." - The authors spent much time analyzing the adjacent pathways related to oxalate and oxalaterelated products of oxalate metabolism. However, my understanding is that the genes used to analyze these pathways (formate metabolism, acetogenesis, methanogenesis), such as the ones named above, are not unique/specific for those pathways but participate in other "housekeeping" pathways. What is the relevance of these analyses when those genes are not unique/specific to the function/pathways that the authors describe? If I infer correctly, these bioinformatic analyses aim to evaluate the hypothesis of whether oxalate metabolism could be a social/cooperation metabolism and whether other species could participate in the metabolism of oxalate subproducts. However, these analyses did not explicitly evaluate this hypothesis.

      The reviewer is correct in that we aimed to evaluate the potential that oxalate metabolism could benefit from metabolic cooperation.  The specific genes chosen for this analysis were those explicitly listed in the target metabolic pathways in KEGG, as described.  However, while the analyses do show the strong potential that the CO2 and formate produced from oxalate degradation could be used in these other pathways, as intended, the genes can be used in other metabolic pathways.  We did, however, explicitly test the hypothesis that formate, produced from oxalate degradation, could be utilized by the gut microbiota.  While the targeted transplants with the taxonomic cohort did not clearly show the use of formate in this way, those from the metabolic cohort did (Figures 6d and S8d).  This question is still in ongoing investigations in our group.  

      We have made it more clear that our genome analyses provide the potential for metabolic redundancy rather than definitive proof for metabolic redundancy, which was evaluated more extensively in other experiments from this study.

      (a) Lines 481-484. "Collectively, data offer strong support for the hypothesis that metabolic redundancy among diverse taxa, is the primary driver of oxalate homeostasis, rather than metabolic cooperation in which the by-products of oxalate degradation are used in downstream pathways such as acetogenesis, methanogenesis, and sulfate reduction." - Although the authors recognize that their data about the metabolic cooperation hypothesis is inconclusive, they never tested the hypothesis related to metabolic cooperation, as mentioned above. This is highly speculative.

      As stated above, the targeted microbial transplants to animals and in vitro studies (Figure 5e,f) did explicitly test the cooperation hypothesis, but it the results did not support it and instead pointed much more strongly to metabolic redundancy.    

      (4) Lines 355-359. "Cohorts, defined in the STAR methods, were used to delineate hypotheses that either carbon and energy substrates are sufficient to explain known effects of the oxalate-degrading microbial network or that additional aspects of taxa commonly stimulated by dietary oxalate are required to explain past results (taxa defined through previous meta-analysis of studies)." - The definition of the metabolic cohorts and the taxonomic cohorts should not be hidden in the material and methods section. It should be explicit and clearly explained in the main text. Related, the table presented in Figure 5D is exceptionally confusing and does not help to understand and differentiate between the metabolic and the taxonomic cohorts. The authors need to explicitly identify the synthetic communities used in each cohort and each group by their members and their characteristics in supplementary tables.

      In the sentences before those referenced, we state: “Culturomic data recapitulates molecular data to show a considerable amount of redundancy surrounding oxalate metabolism (Fig. 5C). Isolates generated from this assay were used for subsequent study (metabolic cohort; Figure 5D). Additionally, a second cohort was defined and commercially purchased based both on known metabolic functions and the proportion of studies that saw an increase in their taxonomic population with oxalate consumption (Fig. 5D; taxonomic cohort). Where possible, isolates from human sources were obtained.”  Figure 5d explicitly shows the specific species used in each cohort along with the groups they were in for transplant studies, the explicit metabolic pathways we were targeting, along with the % of studies that these species were associated with oxalate metabolism.  All of this information is both in the main text of the results and in the figure legends.  It is not hidden in the methods, but the methods do reiterate what was also placed in the results.   

      In Figures 5 and 6, the authors used the following groups with the corresponding nomenclature: 'Group 1, No_bact; Group 2, Ox; Group 3, Ox_form; Group 4, All; Group 5, No_ox'. Although the information related to these groups is present in the material and method section in lines 1139-1143, the authors also need to explicitly explain the groups and their nomenclature in the main text.

      Since this information is explicitly and succinctly given in the referenced figures, I believe that adding the same information in the text would be too redundant.

      Related to the development of the synthetic communities. How did the authors prepare the synthetic communities or 'cohort' for the in vitro experiments? 

      We added more information for the preparation of microbes and execution of the in vitro assays, as needed.  

      Also, it is unclear in the material and method section how the metabolic profile of each isolated was evaluated (Figure 5C). Related to the bacteria isolated from the culturomic assays, including Figure 5C and metabolic cohort, the authors indeed reported the isolation methodology in lines 1262-1275. However, there is no information about the sequencing of these isolates. The authors should present these isolates as a list (supplementary table) with their names, taxonomy, metabolic profile, and Genome ID if these genomes were submitted to NCBI.

      We added additional information for how metabolic cohort isolates were chosen and how they were taxonomically identified.  The taxonomy and substrate utilization of isolates are in Figure 5D.  We did not sequence the genomes of metabolic cohort bacteria.  However, the ATCC isolates, which comprise the taxonomic cohort, are publicly available.

      The author presented the 248 metagenomics assembles in Figure S1 in a circular chart in context with other genomes. However, the metagenomic assembles should be presented in a table form, with their name, taxonomy, coverage, completeness, and Genome ID, if these genomes were submitted to NCBI.

      The information for the genomes submitted to the NCBI is provided in the data availability statement.  However, we added a table (Table S9) that includes the requested information.   

      (5) Lines 371-3374: "To delineate hypotheses of metabolic redundancy or cooperation for mitigating the negative effects of oxalate on the gut microbiota and host, two independent diet trials were conducted with analogous microbial communities derived from the metabolic and taxonomic cohorts". 

      Lines 494-496: "we and others have found that oxalate can differentially exhibit positive or negative effects on microbial growth and metabolism dependent on the species and environment present" - What is the evidence that oxalate has a negative effect on the gut microbiota? The authors clearly showed the negative effect of oxalate on the host. Although there are reports in the literature of oxalate consumers with a negative effect on the microbiome, such as Lactobacilli and Bifidobacteria, there is no evidence in this manuscript about a negative effect of oxalate on the microbiome, and there is not an experimental design to evaluate it.

      These data are presented in Figure 2A and B.  As stated, oxalate led to a net reduction in total microbial metabolites produced of 34 metabolites, with a significant shift in overall metabolome, indicative of metabolic inhibition.  This is in comparison to the net gain of 9 metabolites, with no significant shift overall,  in the mice with the NALB microbiota.  The positive and negative effects of oxalate on the whole gut microbiota here are bolstered by previous studies on the effect of oxalate on pure cultures as discussed and cited on line 623624.

      (6) Related to the last section, it is hard to really compare the results of the taxonomic cohort versus the metabolic cohort when the data of one cohort is in the main figure and the other in a supplementary figure. In addition, all the comparisons between the two cohorts seem to be qualitative. For any comparisons, the authors need to do a statistical comparison between the groups of the two cohorts.

      The comparison of the two sets of data are indeed qualitative.  This is because these mouse models were run in separate experiments to test separate hypotheses (whether utilization of specific substrates is enough to improve oxalate metabolism or if specific taxa previously responsive to dietary oxalate was better, which is stated in the manuscript).  Given that these experimental models were tested separately, it would not be statistically valid to do a direct statistical comparison, even though the experimental procedures were the same and the only difference were the transplanted bacteria.  The separation of the experiments into a main and supplemental figure was done out of necessity given the very large amount of data and many experimental mouse models that were run in this study overall.   

      Minor Comments.

      (1) The authors should define 'antinutrients'. This term is not a familiar concept and could create confusion.

      This is defined in line 104 “molecules produced in plants to deter herbivory, disrupt homeostasis by targeting the function of the microbiome, host, or both”

      (2) The authors should explicitly describe the N. albigulia, aka White-throated woodrat system, as early as possible in the result section.

      We added some statements about the Swiss webster and N. albigula gut microbiota as poor and effective oxalate degraders in the second section of the results.

      (3) SW-SW mice exhibited an oxalate-dependent alteration of 219 hepatic genes, with a net increase in activity. In comparison, the SW-NALB mice exhibited an oxalate-dependent alteration of 21 genes with a net decrease in activity. However, the visual representation of the PCoA in Figure 1B showed that the most different samples are the SW-NALB 0% and 1.5%. Could you please explain this difference?

      In Figure 1b, the SW-NALB data are represented by the blue and black data points, which directly overlap with each other.  The SW-SW data are the orange and purple data points, which exhibit very little overlap.  

      (4) Is Table S7 the same as Table S6? If not, there is a missing supplementary table.

      These tables are different.  We ensured that both are present.

      (5) How did the authors test bacterial growth in in vivo studies (Figure 5B)?

      We added a statement to the culturomic section of the methods – we used media with or without oxalate and quantified colony-forming units.

      (6) A section of 16S rRNA metagenomics in the material and method section is not used across the main manuscript.

      These data are presented in figures S7 and S10, as stated in the results.  We added statements in the results to clarify that these figures show the 16S sequencing data.

      (7) Lines 506-511: "Collectively, data from the current and previous studies on the effect of oxalate exposure on the gut microbiota support the hypothesis that the gut microbiota serves as an adaptive organ in which specific, metabolically redundant microbes respond to and eliminate dietary components, for the benefit of themselves, but which can residually protect or harm host health depending on the dietary molecules and gut microbiota composition." - What is the benefit to bacteria in eliminating oxalate? This is highly speculative to this system.

      The benefit to bacteria is stated earlier in that paragraph – “In the current (Figs. 2B, 5B) and previous studies(33,34,64,65), we and others have found that oxalate can differentially exhibit positive or negative effects on microbial growth and metabolism dependent on the species and environment present.”

      (8) Lines 504 -506: "Importantly, the near-universal presence of formate metabolism genes suggest that formate may be an even greater source of ecological pressure (Figures S2-S5)."

      - Formate is primarily produced by fermentative anaerobic bacteria, such as Bacteroides, Clostridia, and certain species of Escherichia coli, since formate would be present in anaerobic communities independently of oxalate. How is formate an even greater source of ecological pressure?

      We added a statement about the toxicity of formate to both bacteria and mammalian hosts.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      Summary

      In this study, the authors build upon previous research that utilized non-invasive EEG and MEG by analyzing intracranial human ECoG data with high spatial resolution. They employed a receptive field mapping task to infer the retinotopic organization of the human visual system. The results present compelling evidence that the spatial distribution of human alpha oscillations is highly specific and functionally relevant, as it provides information about the position of a stimulus within the visual field.

      Using state-of-the-art modeling approaches, the authors not only strengthen the existing evidence for the spatial specificity of the human dominant rhythm but also provide new quantification of its functional utility, specifically in terms of the size of the receptive field relative to the one estimated based on broad band activity.

      We thank the reviewer for their positive summary.

      Weakness 1.1

      The present manuscript currently omits the complementary view that the retinotopic map of the visual system might be related to eye movement control. Previous research in non-human primates using microelectrode stimulation has clearly shown that neuronal circuits in the visual system possess motor properties (e.g. Schiller and Styker 1972, Schiller and Tehovnik 2001). More recent work utilizing Utah arrays, receptive field mapping, and electrical stimulation further supports this perspective, demonstrating that the retinotopic map functions as a motor map. In other words, neurons within a specific area responding to a particular stimulus location also trigger eye movements towards that location when electrically stimulated (e.g. Chen et al. 2020).

      Similarly, recent studies in humans have established a link between the retinotopic variation of human alpha oscillations and eye movements (e.g., Quax et al. 2019, Popov et al. 2021, Celli et al. 2022, Liu et al. 2023, Popov et al. 2023). Therefore, it would be valuable to discuss and acknowledge this complementary perspective on the functional relevance of the presented evidence in the discussion section.

      The reviewer notes that we do not discuss the oculomotor system and alpha oscillations. We agree that the literature relating eye movements and alpha oscillations are relevant.

      At the Reviewer’s suggestion, we added a paragraph on this topic to the first section of the Discussion (section 3.1, “Other studies have proposed … “).

      Reviewer #2 (Public Review):

      Summary:

      In this work, Yuasa et al. aimed to study the spatial resolution of modulations in alpha frequency oscillations (~10Hz) within the human occipital lobe. Specifically, the authors examined the receptive field (RF) tuning properties of alpha oscillations, using retinotopic mapping and invasive electroencephalogram (iEEG) recordings. The authors employ established approaches for population RF mapping, together with a careful approach to isolating and dissociating overlapping, but distinct, activities in the frequency domain. Whereby, the authors dissociate genuine changes in alpha oscillation amplitude from other superimposed changes occurring over a broadband range of the power spectrum. Together, the authors used this approach to test how spatially tuned estimated RFs were when based on alpha range activity, vs. broadband activities (focused on 70-180Hz). Consistent with a large body of work, the authors report clear evidence of spatially precise RFs based on changes in alpha range activity. However, the size of these RFs were far larger than those reliably estimated using broadband range activity at the same recording site. Overall, the work reflects a rigorous approach to a previously examined question, for which improved characterization leads to improved consistency in findings and some advance of prior work.

      We thank the reviewer for the summary.

      Strengths:

      Overall, the authors take a careful and well-motivated approach to data analyses. The authors successfully test a clear question with a rigorous approach and provide strong supportive findings. Firstly, well-established methods are used for modeling population RFs. Secondly, the authors employ contemporary methods for dissociating unique changes in alpha power from superimposed and concomitant broadband frequency range changes. This is an important confound in estimating changes in alpha power not employed in prior studies. The authors show this approach produces more consistent and robust findings than standard band-filtering approaches. As noted below, this approach may also account for more subtle differences when compared to prior work studying similar effects.

      We thank the reviewer for the positive comments.

      Weaknesses:

      Weakness 2.1 Theoretical framing:

      The authors frame their study as testing between two alternative views on the organization, and putative functions, of occipital alpha oscillations: i) alpha oscillation amplitude reflects broad shifts in arousal state, with large spatial coherence and uniformity across cortex; ii) alpha oscillation amplitude reflects more specific perceptual processes and can be modulated at local spatial scales. However, in the introduction this framing seems mostly focused on comparing some of the first observations of alpha with more contemporary observations. Therefore, I read their introduction to more reflect the progress in studying alpha oscillations from Berger's initial observations to the present. I am not aware of a modern alternative in the literature that posits alpha to lack spatially specific modulations. I also note this framing isn't particularly returned to in the discussion.

      This was helpful feedback. We have rewritten nearly the entire Introduction to frame the study differently. The emphasis is now on the fact that several intracranial studies of spatial tuning of alpha (in both human and macaque) tend to show increases in alpha due to visual stimulation, in contrast to a century of MEG/EEG studies, from Berger to the present, showing decreases. We believe that the discrepancy is due to an interaction between measurement type and brain signals. Specifically, intracranial measurements sum decreases in alpha oscillations and increases in broadband power on the same trials, and both signals can be large. In contrast, extracranial measures are less sensitive to the broadband signals and mostly just measure the alpha oscillation. Our study reconciles this discrepancy by removing the baseline broadband power increases, thereby isolating the alpha oscillation, and showing that with iEEG spatial analyses, the alpha oscillation decreases with visual stimulation, consistent with EEG and MEG results.

      Weakness 2.2 A second important variable here is the spatial scale of measurement.

      It follows that EEG based studies will capture changes in alpha activity up to the limits of spatial resolution of the method (i.e. limited in ability to map RFs). This methodological distinction isn't as clearly mentioned in the introduction, but is part of the author's motivation. Finally, as noted below, there are several studies in the literature specifically addressing the authors question, but they are not discussed in the introduction.

      The new Introduction now explicitly contrasts EEG/MEG with intracranial studies and refers to the studies below.

      Weakness 2.3 Prior studies:

      There are important findings in the literature preceding the author's work that are not sufficiently highlighted or cited. In general terms, the spatio-temporal properties of the EEG/iEEG spectrum are well known (i.e. that changes in high frequency activity are more focal than changes in lower frequencies). Therefore, the observations of spatially larger RFs for alpha activities is highly predicted. Specifically, prior work has examined the impact of using different frequency ranges to estimate RF properties, for example ECoG studies in the macaque by Takura et al. NeuroImage (2016) [PubMed: 26363347], as well as prior ECoG work by the author's team of collaborators (Harvey et al., NeuroImage (2013) [PubMed: 23085107]), as well as more recent findings from other groups (Luo et al., (2022) BioRxiv: https://doi.org/10.1101/2022.08.28.505627). Also, a related literature exists for invasively examining RF mapping in the time-voltage domain, which provides some insight into the author's findings (as this signal will be dominated by low-frequency effects). The authors should provide a more modern framing of our current understanding of the spatial organization of the EEG/iEEG spectrum, including prior studies examining these properties within the context of visual cortex and RF mapping. Finally, I do note that the author's approach to these questions do reflect an important test of prior findings, via an improved approach to RF characterization and iEEG frequency isolation, which suggests some important differences with prior work.

      Thank you for these references and suggestions. Some of the references were already included, and the others have been added.

      There is one issue where we disagree with the Reviewer, namely that “the observations of spatially larger RFs for alpha activities is highly predicted”. We agree that alpha oscillations and other low frequency rhythms tend to be less focal than high frequency responses, but there are also low frequency non-rhythmic signals, and these can be spatially focal. We show this by demonstrating that pRFs solved using low frequency responses outside the alpha band (both below and above the alpha frequency) are small, similar to high frequency broadband pRFs, but differing from the large pRFs associated with alpha oscillations. Hence we believe the degree to which signals are focal is more related to the degree of rhythmicity than to the temporal frequency per se. While some of these results were already in the supplement, we now address the issue more directly in the main text in a new section called, “2.5 The difference in pRF size is not due to a difference in temporal frequency.”

      We incorporated additional references into the Introduction, added a new section on low frequency broadband responses to the Results (section 2.5), and expanded the Discussion (section 3.2) to address these new references.

      Weakness 2.4 Statistical testing:

      The authors employ many important controls in their processing of data. However, for many results there is only a qualitative description or summary metric. It appears very little statistical testing was performed to establish reported differences. Related to this point, the iEEG data is highly nested, with multiple electrodes (observations) coming from each subject, how was this nesting addressed to avoid bias?

      We reviewed the primary claims made in the manuscript and for each claim, we specify the supporting analyses and, where appropriate, how we address the issue of nesting. Although some of these analyses were already in the manuscript, many of them are new, including all of the analyses concerning nesting. We believe that putting this information in one place will be useful to the reader, and we now include this text as a new section in supplement, Graphical and statistical support for primary claims.

      Reviewer #2 (Recommendations For The Authors):

      Recommendation 2.1:

      Data presentation: In several places, the authors discuss important features of cortical responses as measured with iEEG that need to be carefully considered. This is totally appropriate and a strength of the author's work, however, I feel the reader would benefit from more depiction of the time-domain responses, to help better understand the authors frequency domain approach. For example, Figure 1 would benefit from showing some form of voltage trace (ERP) and spectrogram, not just the power spectra. In addition, part (a) of Figure 1 could convey some basic information about the timing of the experimental paradigm.

      We changed panel A of Figure 1 to include the timing of the experimental paradigm, and we added panels C and D to show the electrode time series before and after regression out of the ERP.

      Recommendation 2.2

      Update introduction to include references to prior EEG/iEEG work on spatial distribution across frequency spectrum, and importantly, prior work mapping RFs with different frequencies.

      We have addressed this issue and re-written our introduction. Please refer to our response in Public Review for further details.

      Recommendation 2.3

      Figure 3 has several panels and should be labeled to make it easier to follow.The dashed line in lower power spectra isn't defined in a legend and is missing from the upper panel - please clarify.

      We updated Figure 3 and reordered the panels to clarify how we computed the summary metrics in broadband and alpha for each stimulus location (i.e., the “ratio” values plotted in panel B). We also simplified the plot of the alpha power spectrum. It now shows a dashed line representing a baseline-corrected response to the mapping stimulus, which is defined in the legend and explained in the caption.

      Recommendation 2.4

      Power spectra are always shown without error shading, but they are mean estimates.

      We added error shading to Figures 1, 2 and 3.

      Recommendation 2.5

      The authors deal with voltage transients in response to visual stimulation, by subtracting out the trail averaged mean (commonly performed). However, the efficacy of this approach depends on signal quality and so some form of depiction for this processing step is needed.

      We added a depiction of the processing steps for regressing out the averaged responses in Figure 1 in an example electrode (panels C and D). We also show in the supplement the effect of regressing out the ERP on all the electrode pRFs. We have added Supplementary Figure 1-2.

      Recommendation 2.6

      I have a similar request for the authors latency correction of their data, where they identified a timing error and re-aligned the data without ground truth. Again, this is appropriate, but some depiction of the success of this correction is very critical for confirming the integrity of the data.

      We now report more detail on the latency correction, and also point out that any small error in the estimate would not affect our conclusions (4.6 ECoG data analysis | Data epoching). The correction was important for a prior paper on temporal dynamics (Groen et al, 2022), which used data from the same participants and estimated the latency of responses. In this paper, our analyses are in the spectral domain (and discard phase), so small temporal shifts are not critical. We now also link to the public code associated with that paper, which implemented the adjustment and quantified the uncertainty in the latency adjustment.

      More details on latency adjustment provided in section 4.6.

      Recommendation 2.7

      In many places the authors report their data shows a 'summary' value, please clarify if this means averaging or summation over a range.

      For both broadband and alpha, we derive one summary value (a scalar) for trial for each stimulus. For broadband, the summary metric is the ratio of power during a given trial and power during blanks, where power in a trial is the geometric mean of the power at each frequency within the defined band). This is equation 3 in the methods, which is now referred to the first time that summary metrics are mentioned in the results.  For alpha, the summary metric is the height of the Gaussian from our model-based approach. This is in equations 1 and 2, and is also now referred to the first time summary metrics are mentioned in the results.

      We added explanation of the summary metrics in the figure captions and results where they are first used, and also referred to the equations in the methods where they are defined.

      Recommendation 2.8

      The authors conclude: "we have discovered that spectral power changes in the alpha range reflect both suppression of alpha oscillations and elevation of broadband power." It might not have been the intention, but 'discovered' seems overstated.

      We agree and changed this sentence.

      Recommendation 2.9

      Supp Fig 9 is a great effort by the authors to convey their findings to the reader, it should be a main figure.

      We are glad you found Supplementary Figure 9 valuable. We moved this figure to the main text.

      Reviewer #3 (Public Review):

      Summary:

      This study tackles the important subject of sensory driven suppression of alpha oscillations using a unique intracranial dataset in human patients. Using a model-based approach to separate changes in alpha oscillations from broadband power changes, the authors try to demonstrate that alpha suppression is spatially tuned, with similar center location as high broadband power changes, but much larger receptive field. They also point to interesting differences between low-order (V1-V3) and higher-order (dorsolateral) visual cortex. While I find some of the methodology convincing, I also find significant parts of the data analysis, statistics and their presentation incomplete. Thus, I find that some of the main claims are not sufficiently supported. If these aspects could be improved upon, this study could potentially serve as an important contribution to the literature with implications for invasive and non-invasive electrophysiological studies in humans.

      We thank the reviewer for the summary.

      Strengths:

      The study utilizes a unique dataset (ECOG & high-density ECOG) to elucidate an important phenomenon of visually driven alpha suppression. The central question is important and the general approach is sound. The manuscript is clearly written and the methods are generally described transparently (and with reference to the corresponding code used to generate them). The model-based approach for separating alpha from broadband power changes is especially convincing and well-motivated. The link to exogenous attention behavioral findings (figure 8) is also very interesting. Overall, the main claims are potentially important, but they need to be further substantiated (see weaknesses).

      We thank the reviewer for the positive comments.

      Weaknesses:

      I have three major concerns:

      Weakness 3.1. Low N / no single subject results/statistics:

      The crucial results of Figure 4,5 hang on 53 electrodes from four patients (Table 2). Almost half of these electrodes (25/53) are from a single subject. Data and statistical analysis seem to just pool all electrodes, as if these were statistically independent, and without taking into account subject-specific variability. The mean effect per each patient was not described in text or presented in figures. Therefore, it is impossible to know if the results could be skewed by a single unrepresentative patient. This is crucial for readers to be able to assess the robustness of the results. N of subjects should also be explicitly specified next to each result.

      We have added substantial changes to deal with subject specific effects, including new results and new figures.

      • Figure 4 now shows variance explained by the alpha pRF broken down by each participant for electrodes in V1 to V3. We also now show a similar figure for dorsolateral electrodes in Supplementary Figure 4-2.

      • Figure 5, which shows results from individual electrodes in V1 to V3, now includes color coding of electrodes by participant to make it clear how the electrodes group with participant. Similarly, for dorsolateral electrodes, we show electrodes grouped by participant in Supplementary Figure 5-1. Same for Supplementary Figure 6-2.

      • Supplementary Figure 7-2 now shows the benefits of our model-based approach for estimating alpha broken down by individual participants.

      • We also now include a new section in the supplement that summarizes for every major claim, what the supporting data are and how we addressed the issue of nesting electrodes by participant, section Graphical and statistical support for primary claims.

      Weakness 3.2. Separation between V1-V3 and dorsolateral electrodes:

      Out of 53 electrodes, 27 were doubly assigned as both V1-V3 and dorsolateral (Table 2, Figures 4,5). That means that out of 35 V1-V3 electrodes, 27 might actually be dorsolateral. This problem is exasperated by the low N. for example all the 20 electrodes in patient 8 assigned as V1-V3 might as well be dorsolateral. This double assignment didn't make sense to me and I wasn't convinced by the authors' reasoning. I think it needlessly inflates the N for comparing the two groups and casts doubts on the robustness of these analyses.

      Electrode assignment was probabilistic to reflect uncertainty in the mapping between location and retinotopic map. The probabilistic assignment is handled in two ways.

      (1) For visualizing results of single electrodes, we simply go with the maximum probability, so no electrode is visualized for both groups of data. For example, Figure 5a (V1-V3) and supplementary Figure 5-1a (dorsolateral electrodes) have no electrodes in common: no electrode is in both plots.

      (2) For quantitative summaries, we sample the electrodes probabilistically (for example Figures 4, 5c). So, if for example, an electrode has a 20% chance of being in V1 to V3, and 30% chance of being in dorsolateral maps, and a 50% chance of being in neither, the data from that electrode is used in only 20% of V1-V3 calculations and 30% of dorsolateral calculations. In 50% of calculations, it is not used at all. This process ensures that an electrode with uncertain assignment makes no more contribution to the results than an electrode with certain assignment. An electrode with a low probability of being in, say, V1-V3, makes little contribution to any reported results about V1-V3. This procedure is essentially a weighted mean, which the reviewer suggests in the recommendations. Thus, we believe there is not a problem of “double counting”.

      The alternative would have been to use maximum probability for all calculations. However, we think that doing so would be misleading, since it would not take into account uncertainty of assignment, and would thus overstate differences in results between the maps.

      We now clarify in the Results that for probabilistic calculations, the contribution of an electrode is limited by the likelihood of assignment (Section 2.3). We also now explain in the methods why we think probabilistic sampling is important.

      Weakness 3.3. Alpha pRFs are larger than broadband pRFs:

      First, as broadband pRF models were on average better fit to the data than alpha pRF models (dark bars in Supp Fig 3. Top row), I wonder if this could entirely explain the larger Alpha pRF (i.e. worse fits lead to larger pRFs). There was no anlaysis to rule out this possibility.

      We addressed this question in a new paragraph in Discussion section 3.1 (“What is the function of the large alpha pRFs?”, paragraph beginning… “Another possible interpretation is that the poorer model fit in the alpha pRF is due to lower signal-to-noise”). This paragraph both refers to prior work on the relationship between noise and pRF size and to our own control analyses (Supplementary Figure 5-2).

      Weakness 3.4 Statistics

      Second, examining closely the entire 2.4 section there wasn't any formal statistical test to back up any of the claims (not a single p-value is mentioned). It is crucial in my opinion to support each of the main claims of the paper with formal statistical testing.

      We agree that it is important for the reader to be able to link specific results and analyses to specific claims. We are not convinced that null hypothesis statistical testing is always the best approach. This is a topic of active debate in the scientific community.

      We added a new section that concisely states each major claim and explicitly annotates the supporting evidence. (Section 4.7). Please also refer to our responses to Reviewer #2 regarding statistical testing (Reviewer weakness 2.4 “Statistical testing”)

      Weakness 3.5 Summary

      While I judge these issues as crucial, I can also appreciate the considerable effort and thoughtfulness that went into this study. I think that addressing these concerns will substantially raise the confidence of the readership in the study's findings, which are potentially important and interesting.

      We again thank the reviewer for the positive comments.

      Reviewer #3 (Recommendations For The Authors):

      Suggestions for how to address the three major concerns:

      Suggestion 3.1.

      I am very well aware that it's very hard to have n=30 in a visual cortex ECOG study. That's fine. Best practice would be to have a linear mixed effects model with patients as a random effect. However, for some figures with just 3-4 patients (Figure 4,5) the sample size might be too small even for that. At the very minimum, I would expect to show in figures/describe in text all results per patient (perhaps one can do statistics within each patient, and show for each patient that the effect is significant). Even in primate studies with just two subjects it is expected to show that the results replicate for subject A and B. It is necessary to show that your results don't depend on a single unrepresentative subject. And if they do, at least be transparent about it.

      We have addressed this thoroughly. Please see response to Weakness 3.1 (“Low N / no single subject results/statistics”).

      Suggestion 3.2.

      I just don't get it. I would simply assign an electrode to V1-V3 or dorsolateral cortex based on which area has the highest probability. It doesn't make sense to me that an electrode that has 60% of being in dorsolateral cortex and only 10% to be in V1-V3 would be assigned as both V1-V3 and dorsolateral. Also, what's the rationale to include such electrode in the analysis for let's say V1-V3 (we have weak evidence to believe it's there)? I would either assign electrodes based on the highest probability, or alternatively do a weighted mean based on the probability of each electrode belonging to each region group (e.g. electrode with 40% to be in V1-V3, will get twice the weight as an electrode who has 20% to be in V1-V3) but this is more complicated.

      We have addressed this issue. Please refer to our response in Public Review (“Weakness 3.2 Separation between V1-V3 and dorsolateral”) for details.

      Suggestion 3.3.

      First, to exclude the possibility that alpha pRF are larger simply because they have a worse fit to the neural data, I would show if there is a correlation between the goodnessof-fit and pRF size (for alpha and broadband signals, separately). No [negative] correlation between goodness-of-fit and pRF size would be a good sign. I would also compare alpha & broadband receptive field size when controlling for the goodness-of-fit (selecting electrodes with similar goodness-of-fit for both signals). If the results replicate this way it would be convincing.

      Second, there are no statistical tests in section 2.4, possibly also in others. Even if you employ bootstrap / Monte-Carlo resampling methods you can extract a p-value.

      We have addressed this issue. Please refer to our response in Public Review Point 3.3 (“Alpha pRFs are larger than broadband pRFs”) for further details.

      Suggestion 3.4.

      Also, I don't understand the resampling procedure described in lines 652-660: "17.7 electrodes were assigned to V1-V3, 23.2 to dorsolateral, and 53 to either " - but 17.7 + 23.2 doesn't add up to 53. It also seems as if you assign visual areas differently in this resampling procedure than in the real data - "and randomly assigned each electrode to a visual area according to the Wang full probability distributions". If you assign in your actual data 27 electrodes to both visual areas, the same should be done in the resampling procedure (I would expect exactly 35 V1-V3 and 45 dorsolateral electrodes in every resampling, just the pRFs will be shuffled across electrodes).

      We apologize for the confusion.

      We fixed the sentence above, clarified the caption to Table 2, and also explained the overall strategy of probabilistic resampling better. See response to Public Review point 3.2 for details.

      Suggestion 3.5.

      These are rather technical comments but I believe they are crucial points to address in order to support your claims. I genuinely think your results are potentially interesting and important but these issues need to be first addressed in a revision. I also think your study may carry implications beyond just the visual domain, as alpha suppression is observed for different sensory modalities and cortical regions. Might be useful to discuss this in the discussion section.

      Agree. We added a paragraph on this point to the Discussion (very end of 3.2).

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Astrocytes are known to express neuroligins 1-3. Within neurons, these cell adhesion molecules perform important roles in synapse formation and function. Within astrocytes, a significant role for neuroligin 2 in determining excitatory synapse formation and astrocyte morphology was shown in 2017. However, there has been no assessment of what happens to synapses or astrocyte morphology when all three major forms of neuroligins within astrocytes (isoforms 1-3) are deleted using a well characterized, astrocyte specific, and inducible cre line. By using such selective mouse genetic methods, the authors here show that astrocytic neuroligin 1-3 expression in astrocytes is not consequential for synapse function or for astrocyte morphology. They reach these conclusions with careful experiments employing quantitative western blot analyses, imaging and electrophysiology. They also characterize the specificity of the cre line they used. Overall, this is a very clear and strong paper that is supported by rigorous experiments. The discussion considers the findings carefully in relation to past work. This paper is of high importance, because it now raises the fundamental question of exactly what neuroligins 1-3 are actually doing in astrocytes. In addition, it enriches our understanding of the mechanisms by which astrocytes participate in synapse formation and function. The paper is very clear, well written and well illustrated with raw and average data.

      We thank the reviewer for the balanced and informative summary.

      Reviewer #2 (Public Review):

      In the present manuscript, Golf et al. investigate the consequences of astrocyte-specific deletion of Neuroligin family cell adhesion proteins on synapse structure and function in the brain. Decades of prior research had shown that Neuroligins mediate their effects at synapses through their role in the postsynaptic compartment of neurons and their transsynaptic interaction with presynaptic Neurexins. More recently, it was proposed for the first time that Neuroligins expressed by astrocytes can also bind to presynaptic Neurexins to regulate synaptogenesis (Stogsdill et al. 2017, Nature). However, several aspects of the model proposed by Stogsdill et al. on astrocytic Neuroligin function conflict with prior evidence on the role of Neuroligins at synapses, prompting Golf et al. to further investigate astrocytic Neuroligin function in the current study. Using postnatal conditional deletion of Neuroligins 1, 2 and 3 specifically from astrocytes, Golf et al. show that virtually no changes in the expression of synaptic proteins or in the properties of synaptic transmission at either excitatory or inhibitory synapses are observed. Moreover, no alterations in the morphology of astrocytes themselves were found. The authors conclude that while Neuroligins are indeed expressed in astrocytes and are hence likely to play some role there, this role does not include any direct consequences on synaptic structure and function, in direct contrast to the model proposed by Stogsdill et al.

      Overall, this is a strong study that addresses an important and highly relevant question in the field of synaptic neuroscience. Neuroligins are not only key regulators of synaptic function, they have also been linked to numerous psychiatric and neurodevelopmental disorders, highlighting the need to precisely define their mechanisms of action. The authors take a wide range of approaches to convincingly demonstrate that under their experimental conditions, no alterations in the levels of synaptic proteins or in synaptic transmission at excitatory or inhibitory synapses, or in the morphology of astrocytes, are observed.

      We are also grateful for this reviewer’s constructive comments.

      One caveat to this study is that the authors do not directly provide evidence that their Tamoxifen-inducible conditional deletion paradigm does indeed result in efficient deletion of all three Neuroligins from astrocytes. Using a Cre-dependent tdTomato reporter line, they show that tdTomato expression is efficiently induced by the current paradigm, and they refer to a prior study showing efficient deletion of Neuroligins from neurons using the same conditional Nlgn1-3 mouse lines but a different Cre driver strategy. However, neither of these approaches directly provide evidence that all three Neuroligins are indeed deleted from astrocytes in the current study. In contrast, Stogsdill et al. employed FACS and qPCR to directly quantify the loss of Nlgn2 mRNA from astrocytes. This leaves the current Golf et al. study somewhat vulnerable to the criticism, however unlikely, that their lack of synaptic effects may be a consequence of incomplete Neuroligin deletion, rather than a true lack of effect of astrocytic Neuroligins.

      The concern is valid. In the original submission of this paper, we did not establish that the Cre recombinase we used actually deleted neuroligins in astrocytes. We have now addressed this issue in the revised paper with new experiments as described below.

      However, the reviewer’s impression that the Stogsdill et al. paper confirmed full deletion of Nlgn2 is a misunderstanding of the data in that paper. The reviewer is correct that Stogsdill et al. performed FACS to test the efficacy of the GLAST-Cre mediated deletion of Nlgn2-flox mice, followed by qRT-PCR comparing heterozygous with homozygous mutant mice. With their approach, no wild-type control could be used, as these would lack reporter expression. However, this experiment does NOT allow conclusions about the degree of recombination, both overall recombination (i.e. recombination in all astrocytes regardless of TdT+) and recombination in TdT+ astrocytes because it doesn’t quantify recombination. To quantify the degree of recombination, the paper would have had to perform genomic PCR measurements.  

      The problem with the data on the degree of recombination in the Stogsdill et al. (2017) paper, as we understand them, is two-fold.

      First, the GLAST-Cre line only targets ~40-70% of astrocytes, at least as evidenced by highly sensitive Cre-reporter mice in a variety of studies using this Cre line. The 40-70% variation is likely due to differences in the reporter mice and the tamoxifen injection schedule used. In comparison, we are targeting most astrocytes using the Aldh1l1-CreERT2 mice. Moreover, GLAST-Cre mice exhibit neuronal off-targeting, consistent with at least some of the remaining Nlgn2 qRT-PCR signal in the FACS-sorted cells. As we describe next, this signal also likely comes from astrocytes where recombination was incomplete This is the reason why we, like everyone else, are now using the Aldh1l1-Cre line that has been shown to be more efficient both in terms of the overall targeting of astrocytes (i.e. nearly complete) and the level of recombination observed in reporter(+) astrocytes.

      Second, Stogsdill et al. detected a significant decrease in the Nlgn2 qRT-PCR signal in the FACS-sorted homozygous Nlgn2 KO cells compared to the heterozygous Nlgn2 KO cells but the Nlgn2 qRT-PCR signal was still quite large. The data is presented as normalized to the HET condition. As a result, we don’t know the true level of gene deletion (i.e. compared to TdT- astrocytes). For example, based on the Stogsdill et al. data the HET manipulation could have induced only a 20% reduction in Nlgn2 mRNA levels in TdT(+) astrocytes, in which case the KO would have produced a 40% reduction in Nlgn2 mRNA in TdT(+) astrocytes. Moreover, it is possible based on our own experience with the GLAST-Cre line, that the reporter may also not turn on in some astrocytes where other alleles have been independently recombined – just as some astrocytes that are Td(+) would still be wild-type or heterozygous for Nlgn2. Thus, it is impossible to calculate the actual percentage of recombination from these data, even in TdT(+) cells, absent of PCR of genomic DNA from isolated cells. Alternatively, comparison of mRNA levels using primers sensitive to floxed sequences in wild-type controls versus cKO mice would have also yielded a much better idea of the recombination efficiency.

      In summary, it is unclear whether the Nlgn2 deletion in the Stogsdill et al. paper was substantial or marginal – it is simply impossible to tell.

      Reviewer #3 (Public Review):

      This study investigates the roles of astrocytes in the regulation of synapse development and astrocyte morphology using conditional KO mice carrying mutations of three neuroligins1-3 in astrocytes with the deletion starting at two different time points (P1 and P10/11). The authors use morphological, electrophysiological, and cell-biological approaches and find that there are no differences in synapse formation and astrocyte cytoarchitecture in the mutant hippocampus and visual cortex. These results differ from the previous results (Stogsdill et al., 2017), although the authors make several discussion points on how the differences could have been induced. This study provides important information on how astrocytes and neurons interact with each other to coordinate neural development and function. The experiments were well-designed, and the data are of high quality.

      We also thank this reviewer for helpful comments!

      Recommendations for the authors:

      This project was meant to rigorously test the intriguing overall question whether neuroligins, which are abundantly expressed in astrocytes, regulate synapse formation as astrocytic synapse organizers. The goal of the paper was NOT to confirm or dispute the conclusion by Stogsdill et al. (Nature 2017) that Nlgn2 expressed in astrocytes is essential for excitatory synapse formation and that astrocytic Nlgn1-3 are required for proper astrocyte morphogenesis. Instead, the project was meant to address the much broader question whether the abundant expression of any neuroligin, not just Nlgn2, in astrocytes is essential for neuronal excitatory or inhibitory synapse formation and/or for the astrocyte cytoarchitecture. We felt that this was an important question independent of the Stogsdill et al. paper. We analyzed in our experiments young adult mice, a timepoint that was chosen deliberately to avoid the possibility of observing a possible developmental delay rather than a fundamental function that extends beyond development.

      We do recognize that the conclusion by Stogsdill et al. (2017) that Nlgn2 expression in astrocytes is essential for excitatory synapse formation was very exciting to the field but contradicted a large literature demonstrating that Nlgn2 protein is exclusively localized to inhibitory synapses and absent from excitatory synapses (to name just a few papers, see Graf et al., Cell 2004; Varoqueaux et al., Eur. J. Cell Biol. 2004; Patrizi et al., PNAS 2008;  Hoon et al., J. Neurosci. 2009). In addition, the conclusion of Stogsdill et al. that astrocytic Nlgn2 specifically drove excitatory synapse formation was at odds with previous findings documenting that the constitutive deletion of Nlgn2 in all cells, including astrocytes, has no effect on excitatory synapse numbers (again, to name a few papers, see Varoqueaux et al., Neuron 2006; Blundell et al., Genes Brain Behav. 2008; Poulopoulos et al., Neuron 2009; Gibson et al., J. Neurosci. 2009). These contradictions conferred further urgency to our project, but please note that this project was primarily driven by our curiosity about the function of astrocytic neuroligins, not by a fruitless desire to test the validity of one particular Nature paper.

      The general goal of our paper notwithstanding, few papers from our lab have received as much attention and as many negative comments on social media as this paper when it was published as a preprint. Because we take these criticisms seriously, we have over the last year performed extensive additional experiments to ensure that our findings are well founded. We feel that, on balance, our data are incompatible with the notion that astrocytic neuroligins play a fundamental role in excitatory synapse formation but are consistent with other prior findings obtained with neuroligin KO mice. In the new data we added to the paper, we not only characterized the Cre-mediated deletion of neuroligins in depth, but also employed an independent second system -human neurons cultured on mouse glia- to further validate our conclusions as described below. Although we believe that our results are incompatible with the notion that astrocytic neuroligins fundamentally regulate excitatory or inhibitory synapse formation, we also conclude with regret that we still don’t know what astrocytic neuroligins actually do. Thus, the function of astrocytic neuroligins, as there surely must be one, remains a mystery.

      Finally, there are many possible explanations for the discrepancies between our conclusions and those of Stogsdill et al. as described in our paper. Most of these explanations are technical and may explain why not only our, but also the results of many other previous studies from multiple labs, are inconsistent with the conclusions by Stogsdill et al. (2017), as discussed in detail in the revised paper.

      Reviewer #1 (Recommendations For The Authors):

      The paper is very clear and well written. I have only one comment and that is to increase the sizes of Figs 2, 4 and 6 so that the imaging panels can be seen more clearly. Also, although I know the n numbers are provided in the figure legends, the authors may help the reader by providing them in the results when key data and findings are reported.

      We agree and have followed the reviewer’s suggestions as best as we could.

      Reviewer #2 (Recommendations For The Authors):

      (1) Given the strength and importance of the claims that the authors make, I would highly recommend adding some quantitative evidence regarding the efficacy of deletion in astrocytes, e.g. using the same strategy as in Stogsdill et al. As unlikely as it may be that Neuroligin deletion is in fact incomplete, this possibility cannot be excluded unless directly measured. To avoid future discussions on this subject, it seems that the onus is on the authors to provide this information.

      We concur that this is an important point and have devoted a year-long effort to address it. Note, however, that the strategy employed by Stogsdill et al. does not actually allow conclusions about their recombination efficiency. As described above, it only allows the conclusion that some recombination took place. The Stogsdill et al. Nature paper (2017) is a bit confusing on this point. This approach is thus not appropriate to address the question raised by the reviewer.

      We have performed two experiments to address the issue raised by the reviewer.

      First, we used a viral (i.e. AAV2/5) approach to express Rpl22 with a triple HA-tag, also known as Ribotag, which allows us to purify ribosome-bound mRNA from targeted cells for downstream gene expression analysis. The novel construct is driven by the GfaABC1D promoter and includes two additional features which make it particularly useful. First, upstream of Ribotag is a membrane-targeted, Lck-mVenus followed by a self-cleaving P2A sequence. This allows easy visualization of targeted astrocytes. Second, we have incorporated a cassette of four copies of six miRNA targeting sequences (4x6T) for mIR-124 as was recently published (Gleichman et al., 2023) to eliminate off-target expression in neurons. Based on qPCR analysis, the updated construct allowed >95% de-enrichment of neuronal mRNA and slightly improved observed recombination rates (~10% per gene) relative to an earlier version without 4x6T. Mice that were injected with tamoxifen at P1, similar to other experiments in the paper, were then stereotactically injected at ~P35-40 within the dorsal hippocampus with AAV2/5-GfaABC1D-Lck-mVenus-P2A-Rpl22-HA-4x6T. Approximately 3 weeks later, acute slices were prepared, visualized for fluorescence, and both CA1 and nearby cortex that was partially targeted were isolated for downstream ribosome affinity purification with HA antibodies. Total RNA was saved as input. qPCR was performed using assays that are sensitive to the exons that are floxed in the Nlgn123 cKO mice, so that our quantifications are not confounded by potential differences in non-sense mediated decay. Our control data reveals a striking enrichment of an astrocyte marker gene (e.g. aquaporin-4) and de-enrichment of genes for other cell types. In the CA1, we observed robust loss of Nlgn3 (~96%), Nlgn2 (~86%), and Nlgn1 (65%) gene expression. Similarly, in the cortex, we observed a similarly robust loss of Nlgn3 (93%), Nlgn2 (83%), and Nlgn1 (72%) expression. Given that our targeting of astrocytes based on Ai14 Cre-reporter mice was ~90-99%, these reductions are striking and definitive. The existence of some residual transcript reflects the presence of a small population of astrocytes heterozygous for Nlgn2 and Nlgn3. In contrast, Nlgn1 appears more difficult to recombine and it is likely that some astrocytes are either heterozygous or homozygous knockout cells. Although it is thus possible that Nlgn1 could provide some compensation in our experiments, it is worth noting that Stogsdill et al. found that only Nlgn2 and Nlgn3 knockdown with shRNAs resulted in impaired astrocyte morphology by P21. Moreover, they found that Nlgn2 cKO in astrocytes with PALE of a Cre-containing pDNA impaired astrocyte morphology in a gene-dosage dependent manner and suppressed excitatory synapse formation at P21. Thus, our inability to delete all of Nlgn1 doesn’t readily explain contradictions between our findings and theirs.

      Second, in an independent approach we have cultured glia from mouse quadruple conditional Nlgn1234 KO mice and infected the glia with lentiviruses expressing inactive (DCre, control) or active Cre-recombinase. We confirmed complete recombination by PCR. We then cultured human neurons forming excitatory synapses on the glia expressing or lacking neuroligins and measured the frequency and amplitude of mEPSCs as a proxy for synapse numbers and synaptic function. As shown in the new Figure 9, we detected no significant changes in mEPSCs, demonstrating in this independent system that the glial neuroligins do not detectably influence excitatory synapse formation.

      (2) Along the same lines, the authors should be careful not to overstate their findings in this direction. For example, the figure caption for Figure 2 reads 'Nlgn1-3 are efficiently and selectively deleted in astrocytes by crossing triple Nlgn1-3 conditional KO mice with Adh1l1-CreERT2 driver mice and inducing Cre-activity with tamoxifen early during postnatal development'. This is not technically correct and should be modified to reflect that the authors are not in fact assessing deletion of Nlgn1-3, but only expression of a tdTomato reporter.

      We agree – this is essentially the same criticism as comment #1.

      (3) In general, the animal numbers used for the experiments are rather low. With an n = 4 for most experiments, only large abnormalities would be detected anyway, while smaller alterations would not reach statistical significance due to the inherent biological and technical variance. For the most part, this is not a concern, since there really is no difference between WTs and Nlgn1-3 cKOs. However, trends are observed in some cases, and it is conceivable that these would become significant changes with larger n's, e.g. Figure 3H (Vglut2); Figure 4E (VGlut2 S.P., D.G.); Figure 6D (Vglut2). Increasing the numbers to n = 6 here would greatly strengthen the claims that no differences are observed.

      We concur that small differences would not have been detected in our experiments but feel that given the very large phenotypes of the neuroligin deletions in neurons and of the phenotypes reported by Stogsdill et al. (2017), which also did not employ a large number of animals, a very small phenotype in astrocytes would not have been very informative.

      Minor points:

      (1) Please state the exact genetic background for the mouse lines used.

      Our lab generally uses hybrid CD1/Bl6 mice to avoid artifacts produced by inbred genetic mutations in so-called ‘pure’ lines, especially Bl6 mice. This standard protocol was followed in the present study. Thus, the mice are on a mixed CD1/Bl6 hybrid background.

      Reviewer #3 (Recommendations For The Authors):

      (1) Figure 4 demonstrates that neuroligin 1-3 deletions restricted to astrocytes do not affect the number of excitatory and inhibitory synapses in layer IV of the primary visual cortex. This conclusion could be further strengthened if the authors could provide electrophysiological evidence such as mE/IPSCs.

      We agree but have chosen a different avenue to further test our conclusions because slice electrophysiological experiments are time-consuming, labor intensive, and difficult to quantitate, especially in cortex.

      Specifically, we have co-cultured human neurons with astrocytes that either contain or lack neuroligins (new Fig. 9). With this experimental design, we have total control over ALL neuroligins in astrocytes. Electrophysiological recordings then demonstrated that the complete deletion of all glial neuroligins has no effect on mEPSC frequencies and amplitudes. Although clearly much more needs to be done, the new results confirm in an independent system that glial neuroligins have no effect on synapse formation in the neurons, even though neurons depend on astrocytes for synaptogenic factors as Ben Barres brilliantly showed a decade ago. However, it is important to note that dissociated glia in culture, while synaptogenic, are reactive and may not faithfully recapitulate all roles of astrocytes in synaptogenesis.

      (2) It would help readers if the images showing the punctate double marker stainings of excitatory/inhibitory synapses are presented in merged colors (i.e., yellow colors for red and green puncta colors).

      We have tried to improve the visualization of the rather voluminous studies we performed and illustrate in the figures as best as we could.

      (3) The resolutions of the images in the figures are not good, although I guess it is because the images are for review processes.

      We apologize and would like to assure the reviewer that we are supplying high-resolution images to the journal.

      (4) Typos in lines 82 and 274.

      We have corrected these errors.

    1. Author response:

      The following is the authors’ response to the original reviews

      We thank the reviewers for their thoughtful feedback. We have made substantial revisions to the manuscript to address each of their comments, as we detail below. We want to highlight one major change in particular that addresses a concern raised by both reviewers: the role of the drift rate in our models. Motivated by their astute comments, we went back through our models and realized that we had made a particular assumption that deserved more scrutiny. We previously assumed that the process of encoding the observations made correct use of the objective, generative correlation, but then the process of calculating the weight of evidence used a mis-scaled, subjective version of the correlation. These assumptions led us to scale the drift rate in the model by a term that quantified how the standard deviation of the observation distribution was affected by the objective correlation (encoding), but to scale the bound height by the subjective estimate of the correlation (evidence weighing). However, we realized that encoding may also depend on the subjective correlation experienced by the participant. We have now tested several alternative models and found that the best-fitting model assumes that a single, subjective estimate of the correlation governs both encoding and evidence weighing. An important consequence of updating our models in this way is that we can now account for the behavioral data without needing the additional correlation-dependent drift terms (which, as reviewer #2 pointed out, were difficult to explain).

      We also note that we changed the title slightly, replacing “weighting” with “weighing” for consistency with our usage throughout the manuscript.

      Please see below for more details about this important point and our responses to the reviewers’ specific concerns. 

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The behavioral strategies underlying decisions based on perceptual evidence are often studied in the lab with stimuli whose elements provide independent pieces of decision-related evidence that can thus be equally weighted to form a decision. In more natural scenarios, in contrast, the information provided by these pieces is often correlated, which impacts how they should be weighted. Tardiff, Kang & Gold set out to study decisions based on correlated evidence and compare the observed behavior of human decision-makers to normative decision strategies. To do so, they presented participants with visual sequences of pairs of localized cues whose location was either uncorrelated, or positively or negatively correlated, and whose mean location across a sequence determined the correct choice. Importantly, they adjusted this mean location such that, when correctly weighted, each pair of cues was equally informative, irrespective of how correlated it was. Thus, if participants follow the normative decision strategy, their choices and reaction times should not be impacted by these correlations. While Tardiff and colleagues found no impact of correlations on choices, they did find them to impact reaction times, suggesting that participants deviated from the normative decision strategy. To assess the degree of this deviation, Tardiff et al. adjusted drift-diffusion models (DDMs) for decision-making to process correlated decision evidence. Fitting these models to the behavior of individual participants revealed that participants considered correlations when weighing evidence, but did so with a slight underestimation of the magnitude of this correlation. This finding made Tardiff et al. conclude that participants followed a close-to-normative decision strategy that adequately took into account correlated evidence.

      Strengths:

      The authors adjust a previously used experimental design to include correlated evidence in a simple, yet powerful way. The way it does so is easy to understand and intuitive, such that participants don't need extensive training to perform the task. Limited training makes it more likely that the observed behavior is natural and reflective of everyday decision-making. Furthermore, the design allowed the authors to make the amount of decision-related evidence equal across different correlation magnitudes, which makes it easy to assess whether participants correctly take account of these correlations when weighing evidence: if they do, their behavior should not be impacted by the correlation magnitude.

      The relative simplicity with which correlated evidence is introduced also allowed the authors to fall back to the well-established DDM for perceptual decisions, which has few parameters, is known to implement the normative decision strategy in certain circumstances, and enjoys a great deal of empirical support. The authors show how correlations ought to impact these parameters, and which changes in parameters one would expect to see if participants misestimate these correlations or ignore them altogether (i.e., estimate correlations to be zero). This allowed them to assess the degree to which participants took into account correlations on the full continuum from perfect evidence weighting to complete ignorance. With this, they could show that participants in fact performed rational evidence weighting if one assumed that they slightly underestimated the correlation magnitude.

      Weaknesses:

      The experiment varies the correlation magnitude across trials such that participants need to estimate this magnitude within individual trials. This has several consequences:

      (1) Given that correlation magnitudes are estimated from limited data, the (subjective) estimates might be biased towards their average. This implies that, while the amount of evidence provided by each 'sample' is objectively independent of the correlation magnitude, it might subjectively depend on the correlation magnitude. As a result, the normative strategy might differ across correlation magnitudes, unlike what is suggested in the paper. In fact, it might be the case that the observed correlation magnitude underestimates corresponds to the normative strategy.

      We thank the reviewer for raising this interesting point, which we now address directly with new analyses including model fits (pp. 15–24). These analyses show that the participants were computing correlation-dependent weights of evidence from observation distributions that reflected suboptimal misestimates of correlation magnitudes. This strategy is normative in the sense that it is the best that they can do, given the encoding suboptimality. However, as we note in the manuscript, we do not know the source of the encoding suboptimality (pp. 23–24). We thus do not know if there might be a strategy they could have used to make the encoding more optimal.

      (2) The authors link the normative decision strategy to putting a bound on the log-likelihood ratio (logLR), as implemented by the two decision boundaries in DDMs. However, as the authors also highlight in their discussion, the 'particle location' in DDMs ceases to correspond to the logLR as soon as the strength of evidence varies across trials and isn't known by the decision maker before the start of each trial. In fact, in the used experiment, the strength of evidence is modulated in two ways:

      (i) by the (uncorrected) distance of the cue location mean from the decision boundary (what the authors call the evidence strength) and

      (ii) by the correlation magnitude. Both vary pseudo-randomly across trials, and are unknown to the decision-maker at the start of each trial. As previous work has shown (e.g. Kiani & Shadlen (2009), Drugowitsch et al. (2012)), the normative strategy then requires averaging over different evidence strength magnitudes while forming one's belief. This averaging causes the 'particle location' to deviate from the logLR. This deviation makes it unclear if the DDM used in the paper indeed implements the normative strategy, or is even a good approximation to it.

      We appreciate this subtle, but important, point. We now clarify that the DDM we use includes degrees of freedom that are consistent with normative decision processes that rely on the imperfect knowledge that participants have about the generative process on each trial, specifically: 1) a single drift-rate parameter that is fit to data across different values of the mean of the generative distribution, which is based on the standard assumption for these kinds of task conditions in which stimulus strength is varied randomly from trial-to-trial and thus prevents the use of exact logLR (which would require stimulus strength-specific scale factors; Gold and Shadlen, 2001); 2) the use of a collapsing bound, which in certain cases (including our task) is thought to support a stimulus strength-dependent calibration of the decision variable to optimize decisions (Drugowitsch et al, 2012); and 3) free parameters (one per correlation) to account for subjective estimates of the correlation, which affected the encoding of the observations that are otherwise weighed in a normative manner in the best-fitting model.

      Also, to clarify our terminology, we define the objective evidence strength as the expected logLR in a given condition, which for our task is dependent on both the distance of the mean from the decision boundary and the correlation (p. 7). 

      Given that participants observe 5 evidence samples per second and on average require multiple seconds to form their decisions, it might be that they are able to form a fairly precise estimate of the correlation magnitude within individual trials. However, whether this is indeed the case is not clear from the paper.

      These points are now addressed directly in Results (pp. 23–24) and Figure 7 supplemental figures 1–3. Specifically, we show that, as the reviewer correctly surmised above, empirical correlations computed on each trial tended to be biased towards zero (Fig 7–figure supplement 1). However, two other analyses were not consistent with the idea that participants’ decisions were based on trial-by-trial estimates of the empirical correlations: 1) those with the shortest RTs did not have the most-biased estimates (Fig 7–figure supplement 2), and 2) there was no systematic relationship between objective and subjective fit correlations across participants (Fig 7–figure supplement 3).

      Furthermore, the authors capture any underestimation of the correlation magnitude by an adjustment to the DDM bound parameter. They justify this adjustment by asking how this bound parameter needs to be set to achieve correlation-independent psychometric curves (as observed in their experiments) even if participants use a 'wrong' correlation magnitude to process the provided evidence. Curiously, however, the drift rate, which is the second critical DDM parameter, is not adjusted in the same way. If participants use the 'wrong' correlation magnitude, then wouldn't this lead to a mis-weighting of the evidence that would also impact the drift rate? The current model does not account for this, such that the provided estimates of the mis-estimated correlation magnitudes might be biased.

      We appreciate this valuable comment, and we agree that we previously neglected the potential impact of correlation misestimates on evidence strength. As we now clarify, the correlation enters these models in two ways: 1) via its effect on how the observations are encoded, which involves scaling both the drift and the bound; and 2) via its effect on evidence weighing, which involves scaling only the bound (pp. 15–18). We previously assumed that only the second form of scaling might involve a subjective (mis-)estimate of the correlation. We now examine several models that also include the possibility of either or both forms using subjective correlation estimates. We show that a model that assumes that the same subjective estimate drives both encoding and weighing (the “full-rho-hat” model) best accounts for the data. This model provides better fits (after accounting for differences in numbers of parameters) than models with: 1) no correlation-dependent adjustments (“base” model), 2) separate drift parameters for each correlation condition (“drift” model), 3) optimal (correlation-dependent) encoding but suboptimal weighing (“bound-rho-hat” model, which was our previous formulation), 4) suboptimal encoding and weighing (“scaled-rho-hat” model), and 5) optimal encoding but suboptimal weighing and separate correlation-dependent adjustments to the drift rate (“boundrho-hat plus drift” model). We have substantially revised Figures 5–7 and the associated text to address these points.

      Lastly, the paper makes it hard to assess how much better the participants' choices would be if they used the correct correlation magnitudes rather than underestimates thereof. This is important to know, as it only makes sense to strictly follow the normative strategy if it comes with a significant performance gain.

      We now include new analyses in Fig. 7 that demonstrate how much participants' choices and RT deviate from: 1) an ideal observer using the objective correlations, and 2) an observer who failed to adjust for the fit subjective correlation when weighing the evidence (i.e., using the subjective correlation for encoding but a correlation of zero for weighing). We now indicate that participants’ performance was quite close to that predicted by the ideal observer (using the true, objective correlation) for many conditions. Thus, we agree that they might not have had the impetus to optimize the decision process further, assuming it were possible under these task conditions.

      Reviewer #2 (Public review):

      Summary:

      This study by Tardiff, Kang & Gold seeks to: i) develop a normative account of how observers should adapt their decision-making across environments with different levels of correlation between successive pairs of observations, and ii) assess whether human decisions in such environments are consistent with this normative model.

      The authors first demonstrate that, in the range of environments under consideration here, an observer with full knowledge of the generative statistics should take both the magnitude and sign of the underlying correlation into account when assigning weight in their decisions to new observations: stronger negative correlations should translate into stronger weighting (due to the greater information furnished by an anticorrelated generative source), while stronger positive correlations should translate into weaker weighting (due to the greater redundancy of information provided by a positively correlated generative source). The authors then report an empirical study in which human participants performed a perceptual decision-making task requiring accumulation of information provided by pairs of perceptual samples, under different levels of pairwise correlation. They describe a nuanced pattern of results with effects of correlation being largely restricted to response times and not choice accuracy, which could partly be captured through fits of their normative model (in this implementation, an extension of the well-known drift-diffusion model) to the participants' behaviour while allowing for misestimation of the underlying correlations.

      Strengths:

      As the authors point out in their very well-written paper, appropriate weighting of information gathered in correlated environments has important consequences for real-world decisionmaking. Yet, while this function has been well studied for 'high-level' (e.g. economic) decisions, how we account for correlations when making simple perceptual decisions on well-controlled behavioural tasks has not been investigated. As such, this study addresses an important and timely question that will be of broad interest to psychologists and neuroscientists. The computational approach to arrive at normative principles for evidence weighting across environments with different levels of correlation is very elegant, makes strong connections with prior work in different decision-making contexts, and should serve as a valuable reference point for future studies in this domain. The empirical study is well designed and executed, and the modelling approach applied to these data showcases a deep understanding of relationships between different parameters of the drift-diffusion model and its application to this setting. Another strength of the study is that it is preregistered.

      Weaknesses:

      In my view, the major weaknesses of the study center on the narrow focus and subsequent interpretation of the modelling applied to the empirical data. I elaborate on each below:

      Modelling interpretation: the authors' preference for fitting and interpreting the observed behavioural effects primarily in terms of raising or lowering the decision bound is not well motivated and will potentially be confusing for readers, for several reasons. First, the entire study is conceived, in the Introduction and first part of the Results at least, as an investigation of appropriate adjustments of evidence weighting in the face of varying correlations. The authors do describe how changes in the scaling of the evidence in the drift-diffusion model are mathematically equivalent to changes in the decision bound - but this comes amidst a lengthy treatment of the interaction between different parameters of the model and aspects of the current task which I must admit to finding challenging to follow, and the motivation behind shifting the focus to bound adjustments remained quite opaque. 

      We appreciate this valuable feedback. We have revised the text in several places to make these important points more clearly. For example, in the Introduction we now clarify that “The weight of evidence is computed as a scaled version of each observation (the scaling can be applied to the observations or to the bound, which are mathematically equivalent; Green and Swets, 1966) to form the logLR” (p. 3). We also provide more details and intuition in the Results section for how and why we implemented the DDM the way we did. In particular, we now emphasize that the correlation enters these models in two ways: 1) via its effect on encoding the observations, which scales both the drift and the bound; and 2) via its effect on evidence weighing, which scales only the bound (pp. 15–18).

      Second, and more seriously, bound adjustments of the form modelled here do not seem to be a viable candidate for producing behavioural effects of varying correlations on this task. As the authors state toward the end of the Introduction, the decision bound is typically conceived of as being "predefined" - that is, set before a trial begins, at a level that should strike an appropriate balance between producing fast and accurate decisions. There is an abundance of evidence now that bounds can change over the course of a trial - but typically these changes are considered to be consistently applied in response to learned, predictable constraints imposed by a particular task (e.g. response deadlines, varying evidence strengths). In the present case, however, the critical consideration is that the correlation conditions were randomly interleaved across trials and were not signaled to participants in advance of each trial - and as such, what correlation the participant would encounter on an upcoming trial could not be predicted. It is unclear, then, how participants are meant to have implemented the bound adjustments prescribed by the model fits. At best, participants needed to form estimates of the correlation strength/direction (only possible by observing several pairs of samples in sequence) as each trial unfolded, and they might have dynamically adjusted their bounds (e.g. collapsing at a different rate across correlation conditions) in the process. But this is very different from the modelling approach that was taken. In general, then, I view the emphasis on bound adjustment as the candidate mechanism for producing the observed behavioural effects to be unjustified (see also next point).

      We again appreciate this valuable feedback and have made a number of revisions to try to clarify these points. In addition to addressing the equivalence of scaling the evidence and the bound in the Introduction, we have added the following section to Results (Results, p.18):

      “Note that scaling the bound in these formulations follows conventions of the DDM, as detailed above, to facilitate interpretation of the parameters. These formulations also raise an apparent contradiction: the “predefined” bound is scaled by subjective estimates of the correlation, but the correlation was randomized from trial to trial and thus could not be known in advance. However, scaling the bound in these ways is mathematically equivalent to using a fixed bound on each trial and scaling the observations to approximate logLR (see Methods). This equivalence implies that in the brain, effectively scaling a “predefined” bound could occur when assigning a weight of evidence to the observations as they are presented.”

      We also note in Methods (pp. 40–41):

      “In the DDM, this scaling of the evidence is equivalent to assuming that the decision variable accumulates momentary evidence of the form (x1 + x2) and then dividing the bound height by the appropriate scale factor. An alternative approach would be to scale both the signal and noise components of the DDM by the scale factor. However, scaling the bound is both simpler and maintains the conventional interpretation of the DDM parameters in which the bound reflects the decision-related components of the evidence accumulation process, and the drift rate represents sensory-related components.”

      We believe we provide strong evidence that participants adjust their evidence weighing to account for the correlations (see response below), but we remain agnostic as to how exactly this weighing is implemented in the brain.

      Modelling focus: Related to the previous point, it is stated that participants' choice and RT patterns across correlation conditions were qualitatively consistent with bound adjustments (p.20), but evidence for this claim is limited. Bound adjustments imply effects on both accuracy and RTs, but the data here show either only effects on RTs, or RT effects mixed with accuracy trends that are in the opposite direction to what would be expected from bound adjustment (i.e. slower RT with a trend toward diminished accuracy in the strong negative correlation condition; Figure 3b). Allowing both drift rate and bound to vary with correlation conditions allowed the model to provide a better account of the data in the strong correlation conditions - but from what I can tell this is not consistent with the authors' preregistered hypotheses, and they rely on a posthoc explanation that is necessarily speculative and cannot presently be tested (that the diminished drift rates for higher negative correlations are due to imperfect mapping between subjective evidence strength and the experimenter-controlled adjustment to objective evidence strengths to account for effects of correlations). In my opinion, there are other candidate explanations for the observed effects that could be tested but lie outside of the relatively narrow focus of the current modelling efforts. Both explanations arise from aspects of the task, which are not mutually exclusive. The first is that an interesting aspect of this task, which contrasts with most common 'univariate' perceptual decision-making tasks, is that participants need to integrate two pieces of information at a time, which may or may not require an additional computational step (e.g. averaging of two spatial locations before adding a single quantum of evidence to the building decision variable). There is abundant evidence that such intermediate computations on the evidence can give rise to certain forms of bias in the way that evidence is accumulated (e.g. 'selective integration' as outlined in Usher et al., 2019, Current Directions in Psychological Science; Luyckx et al., 2020, Cerebral Cortex) which may affect RTs and/or accuracy on the current task. The second candidate explanation is that participants in the current study were only given 200 ms to process and accumulate each pair of evidence samples, which may create a processing bottleneck causing certain pairs or individual samples to be missed (and which, assuming fixed decision bounds, would presumably selectively affect RT and not accuracy). If I were to speculate, I would say that both factors could be exacerbated in the negative correlation conditions, where pairs of samples will on average be more 'conflicting' (i.e. further apart) and, speculatively, more challenging to process in the limited time available here to participants. Such possibilities could be tested through, for example, an interrogation paradigm version of the current task which would allow the impact of individual pairs of evidence samples to be more straightforwardly assessed; and by assessing the impact of varying inter-sample intervals on the behavioural effects reported presently.

      We thank the reviewer for this thoughtful and valuable feedback. We have thoroughly updated the modeling section to include new analysis and clearer descriptions and interpretations of our findings (including Figs. 5–7 and additional references to the Usher, Luyckx, and other studies that identified decision suboptimalities). The comment about “an additional computational step” in converting the observations to evidence was particularly useful, in that it made us realize that we were making what we now consider to be a faulty assumption in our version of the DDM. Specifically, we assumed that subjective misestimates of the correlation affected how observations were converted to evidence (logLR) to form the decision (implemented as a scaling of the bound height), but we neglected to consider how suboptimalities in encoding the observations could also lead to misestimates of the correlation. We have retained the previous best-fitting models in the text, for comparison (the “bound-rho-hat” and “bound-rho-hat + drift” models). In addition, we now include a “full-rho-hat” model that assumes that misestimates of rho affect both the encoding of the observations, which affects the drift rate and bound height, and the weighing of the evidence, which affects only the bound height. This was the best-fitting model for most participants (after accounting for different numbers of parameters associated with the different models we tested). Note that the full-rho-hat model predicts the lack of correlation-dependent choice effects and the substantial correlation-dependent RT effects that we observed, without requiring any additional adjustments to the drift rate (as we resorted to previously).

      In summary, we believe that we now have a much more parsimonious account of our data, in terms of a model in which subjective estimates of the correlation are alone able to account for our patterns of choice and RT data. We fully agree that more work is needed to better understand the source of these misestimates but also think those questions are outside the scope of the present study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      A few minor comments:

      (1) Evidence can be correlated in multiple ways. It could be correlated within individual pieces of evidence in a sequence, or across elements in that sequence (e.g., across time). This distinction is important, as it determines how evidence ought to be accumulated across time. In particular, if evidence is correlated across time, simply summing it up might be the wrong thing to do. Thus, it would be beneficial to make this distinction in the Introduction, and to mention that this paper is only concerned with the first type of correlation.

      We now clarify this point in the Introduction (p. 5–6).

      (2) It is unclear without reading the Methods how the blue dashed line in Figure 4c is generated. To my understanding, it is a prediction of the naive DDM model. Is this correct?

      We now specify the models used to make the predictions shown in Fig. 4c (which now includes an additional model that uses unscaled observations as evidence).

      (3) In Methods, given the importance of the distribution of x1 + x2, it would be useful to write it out explicitly, e.g., x1 + x2 ~ N(2 mu_g, ..), specifying its mean and its variance.

      Excellent suggestion, added to p. 38.

      (4) From Methods and the caption of Figure 6 - Supplement 1 it becomes clear that the fitted DDM features a bound that collapses over time. I think that this should also be mentioned in the main text, as it is a not-too-unimportant feature of the model.

      Excellent suggestion, added to p. 15, with reference to Fig. 6-supplement 1 on p. 20.

      (5) The functional form of the bound is 2 (B - tb t). To my understanding, the effective B changes as a function of the correlation magnitude. Does tb as well? If not, wouldn't it be better if it does, to ensure that 2 (B - tb t) = 0 independent of the correlation magnitude?

      In our initial modeling, we also considered whether the correlation-dependent adjustment, which is a function of both correlation sign and magnitude, should be applied to the initial bound or to the instantaneous bound (i.e., after collapse, affecting tb as well). In a pilot analysis of data from 22 participants in the 0.6 correlation-magnitude group, we found that this choice had a negligible effect on the goodness-of-fit (deltaAIC = -0.9, protected exceedance probability = 0.63, in favor of the instantaneous bound scaling). We therefore used the instantaneous bound version in the analyses reported in the manuscript but doubt this choice was critical based on these results. We have clarified our implementation of the bound in Methods (p. 43–44).

      Reviewer #2 (Recommendations for the authors):

      In addition to the points raised above, I have some minor suggestions/open questions that arose from my reading of the manuscript:

      (1) Are the predictions outlined in the paper specific to cases where the two sources are symmetric around zero? If distributions are allowed to be asymmetric then one can imagine cases (i.e. when distribution means are sufficiently offset from one another) where positive correlations can increase evidence strength and negative correlations decrease evidence strength. There's absolutely still value and much elegance in what the authors are showing with this work, but if my intuition is correct, it should ideally be acknowledged that the predictions are restricted to a specific set of generative circumstances.

      We agree that there are a lot of ways to manipulate correlations and their effect on the weight of evidence. At the end of the Discussion, we emphasize that our results apply to this particular form of correlation (p. 32).

      (2) Isn't Figure 4C misleading in the sense that it collapses across the asymmetry in the effect of negative vs positive correlations on RT, which is clearly there in the data and which simply adjusting the correlation-dependent scale factor will not reproduce?

      We agree that this analysis does not address any asymmetries in suboptimal estimates of positive versus negative correlations. We believe that those effects are much better addressed using the model fitting, which we present later in the Results section. We have now simplified the analyses in Fig. 4c, reporting the difference in RT between positive and negative correlation conditions instead of a linear regression.

      (3) I found the transition on p.17 of the Results section from the scaling of drift rate by correlation to scaling of bound height to be quite abrupt and unclear. I suspect that many readers coming from a typical DDM modelling background will be operating under the assumption that drift rate and bound height are independent, and I think more could be done here to explain why scaling one parameter by correlation in the present case is in fact directly equivalent to scaling the other.

      Thank you for the very useful feedback, we have substantially revised this text to make these points more clearly.

      (4) P.3, typo: Alan *Turing*

      That’s embarrassing. Fixed.

      (5) P.27, typo: "participants adopt a *fixed* bound"

      Fixed.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This study presents valuable findings related to seasonal brain size plasticity in the Eurasian common shrew (Sorex araneus), which is an excellent model system for these studies. The evidence supporting the authors' claims is convincing. However, the authors should be careful when applying the term adaptive to the gene expression changes they observe; it would be challenging to demonstrate the differential fitness effects of these gene expression changes. The work will be of interest to biologists working on neuroscience, plasticity, and evolution.

      We appreciate the reviewers’ suggestions and comments. For the phylogenetic ANOVA we used (EVE), which tests for a separate RNA expression optimum specific to the shrew lineage consistent with expectations for adaptive evolution of gene expression. But, as you noted, while this analysis highlights many candidate genes evolving in a manner consistent with positive selection, further functional validation is required to confirm if and how these genes contribute to Dehnel’s phenomenon. In the discussion, we now emphasize that inferred adaptive expression of these genes is putative and outline that future studies are needed to test the function of proposed adaptations. For example, cell line validations of BCL2L1 on apoptosis is a case study that tests the function of a putatively adaptive change in gene expression, and it illuminates this limitation. We also have refined our discussion to focus more on pathway-level analyses rather than on individual genes, and have addressed other issues presented, including clarity of methods and using sex as a covariate in our analyses.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this paper, Thomas et al. set out to study seasonal brain gene expression changes in the Eurasian common shrew. This mammalian species is unusual in that it does not hibernate or migrate but instead stays active all winter while shrinking and then regrowing its brain and other organs. The authors previously examined gene expression changes in two brain regions and the liver. Here, they added data from the hypothalamus, a brain region involved in the regulation of metabolism and homeostasis. The specific goals were to identify genes and gene groups that change expression with the seasons and to identify genes with unusual expression compared to other mammalian species. The reason for this second goal is that genes that change with the season could be due to plastic gene regulation, where the organism simply reacts to environmental change using processes available to all mammals. Such changes are not necessarily indicative of adaptation in the shrew. However, if the same genes are also expression outliers compared to other species that do not show this overwintering strategy, it is more likely that they reflect adaptive changes that contribute to the shrew's unique traits.

      The authors succeeded in implementing their experimental design and identified significant genes in each of their specific goals. There was an overlap between these gene lists. The authors provide extensive discussion of the genes they found.

      The scope of this paper is quite narrow, as it adds gene expression data for only one additional tissue compared to the authors' previous work in a 2023 preprint. The two papers even use the same animals, which had been collected for that earlier work. As a consequence, the current paper is limited in the results it can present. This is somewhat compensated by an expansive interpretation of the results in the discussion section, but I felt that much of this was too speculative. More importantly, there are several limitations to the design, making it hard to draw stronger conclusions from the data. The main contribution of this work lies in the generated data and the formulation of hypotheses to be tested by future work.

      Thank you for your interest in our manuscript and for your insights. We addressed your comments below: we now highlight the limitations of our study design in the discussion and emphasize that, while a second optimum of gene expression in shrews is consistent with adaptive evolution, we recognize that not all sources of variation in gene expression can be fully accounted for. We highlight the putative nature of these results in our revisions, especially in our new limitations section (lines 541-555).

      Strengths:

      The unique biological model system under study is fascinating. The data were collected in a technically sound manner, and the analyses were done well. The paper is overall very clear, well-written, and easy to follow. It does a thorough job of exploring patterns and enrichments in the various gene sets that are identified.

      I specifically applaud the authors for doing a functional follow-up experiment on one of the differentially expressed genes (BCL2L1), even if the results did not support the hypothesis. It is important to report experiments like this and it is terrific to see it done here.

      We are glad to hear that you found our manuscript fascinating and clearly written. While we hoped to see an effect of BCL2L1 on apoptosis as proposed, we agree that reporting null results is valuable when validating evolutionary inferences.

      Weaknesses:

      While the paper successfully identifies differentially expressed seasonal genes, the real question is (as explained by the authors) whether these are evolved adaptations in the shrews or whether they reflect plastic changes that also exist in other species. This question was the motivation for the inter-species analyses in the paper, but in my view, these cannot rigorously address this question. Presumably, the data from the other species were not collected in comparable environments as those experienced by the shrews studied here. Instead, they likely (it is not specified, and might not be knowable for the public data) reflect baseline gene expression. To see why this is problematic, consider this analogy: if we were to compare gene expression in the immune system of an individual undergoing an acute infection to other, uninfected individuals, we would see many, strong expression differences. However, it would not be appropriate to claim that the infected individual has unique features - the relevant physiological changes are simply not triggered in the other individuals. The same applies here: it is hard to draw conclusions from seasonal expression data in the shrews to non-seasonal data in the other species, as shrew outlier genes might still reflect physiological changes that weren't active in the other species.

      There is no solution for this design flaw given the public data available to the authors except for creating matched data in the other species, which is of course not feasible. The authors should acknowledge and discuss this shortcoming in the paper.

      Thank you for taking the time to provide such insightful feedback. As you noted, whiles shrews experience seasonal size changes, their environments may differ from the other species used in this experiment, leading to increased or decreased expression of certain genes and reducing our ability accurately detect selection across the phylogeny. Although we sought to control for as many sources of variation as possible, such as using only post-pubescent, wild, or non-domesticated individuals when feasible, we recognize that not all sources of variation can be fully accounted for within a practical experiment. We agree that these sources of variation can introduce both false positives and negatives into our results, and we have now highlighted this limitation within our discussion (lines 538-552).

      Related to the point above: in the section "Evolutionary Divergence in Expression" it is not clear which of the shrew samples were used. Was it all of them, or only those from winter, fall, etc? One might expect different results depending on this. E.g., there could be fewer genes with inferred adaptive change when using only summer samples. The authors should specify which samples were included in these analyses, and, if all samples were used, conduct a robustness analysis to see which of their detected genes survive the exclusion of certain time points.

      Thank you for this attention to detail. We used spring adults for this analysis. This decision was made as only used post pubescent individuals for all species in the analysis, and this was the only season where adult shrews were going through Dehnel’s phenomenon. We have now clarified this in both the methods and results (line 247 and line 667)

      In the same section, were there also genes with lower shrew expression? None are mentioned in the text, so did the authors not test for this direction, or did they test and there were no significant hits?

      We did test for decreased shrew expression compared to the rest of the species, but there were no significant genes with significant decreases. We hypothesize that there are two potential reasons for this results; 1) If a gene were to be selected for decreased expression, selection for constitutive expression of the gene across all species may be weak, and thus found in other lineages as well, or 2) decreased or no expression may relax selection on the coding regions, and thus these genes are not pulled out as we identify 1:1 orthologs. This is consistent with results provided from the original methods manuscript. Thank you for pointing out that we did not discuss this information in the text, and we now include it in our results (lines 250-251).

      The Discussion is too long and detailed, given that it can ultimately only speculate about what the various expression changes might mean. Many of the specific points made (e.g. about the blood-brain-barrier being more permissive to sensing metabolic state, about cross-organ communication, the paragraphs on single, specific genes) are a stretch based on the available data. Illustrating this point, the one follow-up experiment the authors did (on BCL2L1) did not give the expected result. I really applaud the authors for having done this experiment, which goes beyond typical studies in this space. At the same time, its result highlights the dangers of reading too much into differential expression analyses.

      We agree with your point, while our extensive discussion is useful for testing future hypotheses, ultimately some of the discussion may be too speculative for our readers. To amend this, we have reduced some portions of our discussion and focused more on pathways than individual genes, including removing mechanisms related to HRH2, FAM57B, GPR3, and GABAergic neurons. We hope that this highlights to the reader the speculative nature of many of our results.

      There is no test of whether the five genes observed in both analyses (seasonal change and inter-species) exceed the number expected by chance. When two gene sets are drawn at random, some overlap is expected randomly. The expected overlap can be computed by repeated draws of pairs of random sets of the same size as seen in real data and by noting the overlap between the random pairs. If this random distribution often includes sets of five genes, this weakens the conclusions that can be drawn from the genes observed in the real data.

      Thank you for highlighting this approach, it is greatly needed. After running this test, we found that observed overlapping genes were more than the expected overlap, yet not significant. We now show this in our methods (lines 277-278) and results (lines 719-720).

      Reviewer #2 (Public review):

      Summary:

      Shrews go through winter by shrinking their brain and most organs, then regrow them in the spring. The gene expression changes underlying this unusual brain size plasticity were unknown. Here, the authors looked for potential adaptations underlying this trait by looking at differential expression in the hypothalamus. They found enrichments for DE in genes related to the blood-brain barrier and calcium signaling, as well as used comparative data to look at gene expression differences that are unique in shrews. This study leverages a fascinating organismal trait to understand plasticity and what might be driving it at the level of gene expression. This manuscript also lays the groundwork for further developing this interesting system.

      We are glad you found our manuscript interesting and thank and thank you for your feedback. We hope that we have addressed all of your concerns as described below.

      Strengths:

      One strength is that the authors used OU models to look for adaptation in gene expression. The authors also added cell culture work to bolster their findings.

      Weaknesses:

      I think that there should be a bit more of an introduction to Dehnel's phenomenon, given how much it is used throughout.

      Thank you for this insight. With a lengthy introduction and discussion, we agree that the importance of Dehnel’s phenomenon may have been overshadowed. We have shortened both sections and emphasized the background on Dehnel’s phenomenon in the first two paragraphs of the introduction, allowing this extraordinary seasonal size plasticity to stand out.

      Reviewer #3 (Public review):

      Summary:

      In their study, the authors combine developmental and comparative transcriptomics to identify candidate genes with plastic, canalized, or lineage-specific (i.e., divergent) expression patterns associated with an unusual overwintering phenomenon (Dehnel's phenomenon - seasonal size plasticity) in the Eurasian shrew. Their focus is on the shrinkage and regrowth of the hypothalamus, a brain region that undergoes significant seasonal size changes in shrews and plays a key role in regulating metabolic homeostasis. Through combined transcriptomic analysis, they identify genes showing derived (lineage-specific), plastic (seasonally regulated), and canalized (both lineage-specific and plastic) expression patterns. The authors hypothesize that genes involved in pathways such as the blood-brain barrier, metabolic state sensing, and ion-dependent signaling will be enriched among those with notable transcriptomic patterns. They complement their transcriptomic findings with a cell culture-based functional assessment of a candidate gene believed to reduce apoptosis.

      Strengths:

      The study's rationale and its integration of developmental and comparative transcriptomics are well-articulated and represent an advancement in the field. The transcriptome, known for its dynamic and plastic nature, is also influenced by evolutionary history. The authors effectively demonstrate how multiple signals-evolutionary, constitutive, and plastic-can be extracted, quantified, and interpreted. The chosen phenotype and study system are particularly compelling, as it not only exemplifies an extreme case of Dehnel's phenotype, but the metabolic requirements of the shrew suggest that genes regulating metabolic homeostasis are under strong selection.

      Weaknesses:

      (1) In a number of places (described in detail below), the motivation for the experimental, analytical, or visualization approach is unclear and may obscure or prevent discoveries.

      Thank you for finding our research and manuscript compelling, as well as the valuable feedback that will drastically improve our manuscript. We hope that we have alleviated your concerns below by following your instructions below.

      (2) Temporal Expression - Figure 1 and Supplemental Figure 2 and associated text:

      - It is unclear whether quantitative criteria were used to distinguish "developmental shift" clusters from "season shift" clusters. A visual inspection of Supplemental Figure 2 suggests that some clusters (e.g., clusters 2, 8, and to a lesser extent 12) show seasonal variation, not just developmental differences between stages 1 and 2. While clustering helps to visualize expression patterns, it may not be the most appropriate filter in this case, particularly since all "season shift" clusters are later combined in KEGG pathway and GO analyses (Figure 1B).

      - The authors do not indicate whether they perform cluster-specific GO or KEGG pathway enrichment analyses. The current analysis picks up relevant pathways for hypothalamic control of homeostasis, which is a useful validation, but this approach might not fully address the study's key hypotheses.

      Thank you for this valuable feedback. We did not want to include clusters we deemed to be related to development, as this should not be attributed to changes associated with Dehnel’s phenomenon. We did this through qualitative, visual inspection, which we realize can differ between parties (i.e., clusters 2, 8, and 12 appeared to be seasonal). Qualitatively, we were looking for extreme divergence between Stage 1 and Stage 5 individuals, as expression was related to season and not development, then the average of these stages within cluster should be relatively similar. We have now quantified this as large differences in z-score (abs(summer juvenile-summer adult)>1.25) without meaningful interseason variations determined by a second local maximum (abs(autumn-winter)<0.5 and abs(winter-summer)<0.5)), and added it both our methods (lines 699-702) and results (line 192).

      Regarding the combination of clusters for pathway enrichment compared to individual pathways, we agree that combining clusters may be more informative for overall homeostasis, compared to individual clusters which may inform us on processes directly related to Dehnel’s phenomenon. Initially, we were tentative to conduct this analysis, as clusters contain small gene sets, reducing the ability to detect pathway enrichments. We have now included this analysis, which is reported in our methods (lines 703-704), results (lines 203-204)., and new supplemental table.

      (3) Differential expression between shrinkage (stage 2) and regrowth (stage 4) and cell culture targets

      - The rationale for selecting BCL2L1 for cell culture experiments should be clarified. While it is part of the apoptosis pathway, several other apoptosis-related genes were identified in the differential gene expression (DGE) analysis, some showing stronger differential expression or shrew-specific branch shifts. Why was BCL2L1 prioritized over these other candidates?

      We agree that our rationale for validating BCL2L1 function in neural cell lines was not clearly explained in the manuscript. We selected BCL2L1 because it is the furthest downstream gene in the apoptotic pathway, thus making it the most directly involved gene in programmed cell death, whereas upstream genes could influence additional genes or alternative processes. We have clarified this choice in the revised methods section (lines 748-750).

      - The authors mention maintaining (or at least attempting to maintain) a 1:1 sex ratio for the comparative analysis, but it is unclear if this was also done for the S. araneus analysis. If not, why? If so, was sex included as a covariate (e.g., a random effect) in the differential expression analysis? Sex-specific expression elevates with group variation and could impact the discovery of differentially expressed genes.

      Regarding the use of sex as a covariate, we acknowledge the concerns raised. In our evolutionary analyses, we maintained a balanced sex ratio within species when possible. EVE models handle the effect of sex on gene expression as intraspecific variation. In shrews, however, we used males exclusively, as females were only found among juvenile individuals. Including those juvenile females would have introduced age effects, with perhaps a larger effect on our results. For the seasonal data, we have now included sex as a covariate in differential expression analyses. However, our design is imbalanced in relation to sex, which we have now discussed in our methods (lines 713-714) and discussion limitations (lines 544-548).

      (4) Discussion: The term "adaptive" is used frequently and liberally throughout the discussion. The interpretation of seasonal changes in gene expression as indicators of adaptive evolution should be done cautiously as such changes do not necessarily imply causal or adaptive associations.

      Thank you for this insight. We have reviewed our discussion and clarified that adaptations are putative (i.e. lines 146, 285, and 332), and highlighted this in our limitations section.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) I would recommend always spelling out "Dehnel's phenomenon" or even replacing this term (after crediting the DP term) with the more informative "seasonal size plasticity". Every time I saw "DP", I had to remind myself what this referred to. If the authors choose not to do so, please use the acronym consistently (e.g. line 186 has it spelled out).

      We have replaced the acronym DP with either the full term or the more informative “seasonal size plasticity” throughout the text.

      (2) Line 202: "DEG" has not been defined. Simply add to the line before.

      Thank you for this attention to detail. We have added this to the line above (210).

      (3) Please add a reference for the "AnAge" tool that was used to determine if samples were pubescent.

      Thank you for identifying this oversight. We have now cited the proper paper in line 634.

      (4) In the BCL2L1 section in the results, add a callout to Figure 2D.

      We have now added a callout to Figure 2D within the results (line 234).

      Reviewer #2 (Recommendations for the authors):

      (1) Line 122: is associated? These adaptations?

      Thank you for identifying that we were missing the words “associated with” here. We have fixed this in the revision.

      (2) The first paragraph of the Results should be moved to the methods, except maybe the number of orthologs.

      Thank you for this insight. We have removed this portion from the results section.

      (3) Why a Bonferroni correction on line 188? That seems too strict.

      We agree the Bonferroni correction is strict. Results when using other less strict methods for controlling false discovery rate are also not significant after correction. These corrections can be found within the data, however, we only report on the Bonferroni correction.

      (4) Line 427: "is a novel candidate gene for several neurological disorders" needs some references. I see them a couple of sentences later, but that's quite a sentence with no references at the end.

      We have added the proper citations for this sentence (line 524).

      Reviewer #3 (Recommendations for the authors):

      (1) Temporal Expression - Figure 1 and Supplemental Figure 2 and associated text Line176-193:

      - The authors report the total number of genes meeting inclusion criteria (>0.5-fold change between any two stages and 2 samples >10 normalized reads), but it would be more informative to also provide the number of genes within each temporal cluster. This would offer a clearer understanding of how gene expression patterns are distributed over time.

      Unfortunately, this information is difficult to depict on our figure and would use too much space in the text. We have thus added a description of the range of genes in a new supplemental table depicting this information.

      - It is unclear whether quantitative criteria were used to distinguish "developmental shift" clusters from "season shift" clusters. A visual inspection of Supplemental Figure 2 suggests that some clusters (e.g., clusters 2, 8, and to a lesser extent 12) show seasonal variation, not just developmental differences between stages 1 and 2. While clustering helps to visualize expression patterns, it may not be the most appropriate filter in this case, particularly since all "season shift" clusters are later combined in KEGG pathway and GO analyses (Fig. 1B). Using a differential gene expression criterion might be more suitable. For example, do excluded genes show significant log-fold differences between late-stage comparisons?

      As previously mentioned, we have now quantified seasonal shifts as large differences in z-score (abs(summer juveniles-summer adults)>1.25) without meaningful interseason variations determined by a second local maximum (abs(autumn-winter)<0.5 and abs(winter-summer)<0.5)), and added it to our methods (lines 699-702).  We then follow this up with differential expression analyses as described in Figure 2.

      - Did the authors perform cluster-specific GO or KEGG pathway enrichment analyses instead of focusing on the combined set of genes across the season shift clusters? While I understand that the small number of genes in each cluster may be limiting, if pathways emerge from cluster-specific analysis, they could provide more detailed insights into the functional significance of these temporal expression patterns. The current analysis picks up relevant pathways for hypothalamic control of homeostasis, which is a useful validation, but this approach might not fully address the study's key hypotheses. Additionally, no corrections for multiple hypothesis testing were applied, as noted in the results. A more refined gene set (e.g., using differential expression criteria, described above) could be more appropriate for these analyses.

      We have now included cluster-specific KEGG enrichments as previously described.

      (2) Differential expression between shrinkage (stage 2) and regrowth (stage 4) and cell culture targets - Figure 2 and lines195-227:

      - The rationale for selecting BCL2L1 for cell culture experiments should be clarified. While it is part of the apoptosis pathway, several other apoptosis-related genes were identified in the differential gene expression (DGE) analysis, some showing stronger differential expression or shrew-specific branch shifts. Why was BCL2L1 prioritized over these other candidates?

      We have now included the reasoning for further validation of BCL2L1 as described above.

      - The relevance of the "higher degree" differentially expressed genes needs more explanation. Although this group of genes is highlighted in the results, they are not featured in any subsequent analyses, leaving their importance unclear.

      Thank you for this insight. We have removed this from the methods as it is not relevant to subsequent analyses or conclusions.

      - The authors mention maintaining (or at least attempting to maintain) a 1:1 sex ratio for the comparative analysis (Line 525), but it is unclear if this was also done for the S. araneus analysis. If so, was sex included as a covariate (e.g., a random effect) in the differential expression analysis?

      We have now incorporated information on sex as described above.

      (3) Discussion:

      The term "adaptive" is used frequently and liberally throughout the discussion, but the authors should be cautious in interpreting seasonal changes in gene expression as indicators of adaptive evolution. Such changes do not necessarily imply causal or adaptive associations, and this distinction should be clearly stated when discussing the results.

      Thank you for this feedback and we agree with your conclusion, while a second expression optimum in the shrew lineage is indicative of adaptive expression, we cannot fully determine whether these are caused by genetic or environmental factors, despite careful attention to experimental design. We have highlighted this as a limitation in the discussion.

      (4) Minor Editorial Comment:

      Line 105: "... maintenance of an energy budgets..." delete "an"

      We have removed this grammatical error.

    1. Author response:

      Reviewer #1:

      Strengths:

      (1) Using a fairly generic ecological model, the method can identify the change in the relative importance of different ecological forces (distribution of interspecies interactions, demographic noise, and immigration) in different sample groups. The authors focus on the case of the human gut microbiota, showing that the data are consistent with a higher influence of species interactions (relative to demographic noise and immigration) in a disease microbiota state than in healthy ones. (2) The method is novel, original, and it improves the state-of-the-art methodology for the inference of ecologically relevant parameters. The analysis provides solid evidence for the conclusions. 

      Weaknesses:

      In the way it is written, this work might be mostly read by physicists. We believe that, with some rewriting, the authors could better highlight the ecological implications of the results and make the method more accessible to a broader audience.

      We thank the reviewer for their positive and constructive feedback. We particularly appreciate the recognition of the novelty and robustness of our method, as well as the insight that it sheds light on the shifting ecological forces between healthy and diseased microbiomes. In response to the concern about the manuscript’s accessibility, we aim to revise key sections – including the Introduction, Results, and Discussion – to more clearly articulate the ecological relevance of our theoretical findings. We would like to emphasize that our approach offers a novel perspective for analyzing individual species' abundances, as well as for understanding interaction patterns and stability at the community level. By placing our results within a broader context accessible to readers from diverse backgrounds, we aim for the revised version to appeal to a wider audience, including ecologists and microbiome scientists, while preserving the rigor of our underlying statistical physics framework.

      Reviewer #2:

      Strengths:

      A well-written article, relatively easy to follow and transparent despite the high degree of technicality of the underlying theory. The authors provide a powerful inferring procedure, which bypasses the issue of having only compositional data. 

      Weaknesses:

      (1) This sentence in the introduction seems key to me: "Focusing on single species properties as species abundance distribution (SAD), it fails to characterise altered states of microbiome." Yet it is not explained what is meant by 'fail', and thus what the proposed approach 'solves'. (2) Lack of validation, following arbitrary modelling choices made (symmetry of interactions, weak-interaction limit, uniform carrying capacity). Inconsistent interpretation of instability. Here, instability is associated with the transition to the marginal phase, which becomes chaotic when interaction symmetry is broken. But as the authors acknowledge, the weak interaction limit does not reproduce fat-tailed abundance distributions found in data. On the other hand, strong interaction regimes, where chaos prevails, tend to do so (Mallmin et al, PNAS 2024). Thus, the nature of the instability towards which unhealthy microbiomes approach is unclear. (3) Three technical points about the methodology and interpretation. a) How can order parameters ℎ and 𝑞0 can be inferred, if in the compositional data they are fixed by definition? b) How is it possible that weaker interaction variance is associated with an approach to instability, when the opposite is usually true? c) Having an idea of what the empirical data compares to the theoretical fits would be valuable. Implications: As the authors say, this is a proof of concept. They point at limits and ways to go forward, in particular pointing at ways in which species abundance distributions could be better reproduced by the predicted dynamical models. One implication that is missing, in my opinion, is the interpretability of the results, and what this work achieves that was missing from other approaches (see weaknesses section above): what do we learn from the fact that changes in microbial interactions characterise healthy from unhealthy microbiota? For instance, what does this mean for medical research?

      We greatly appreciate the reviewer’s thoughtful analysis highlighting both the strengths and areas of ambiguity in our work.

      (1) To clarify the sentence on the limitations of species abundance distributions (SADs), we aim to explain in the revised version that while SADs summarize the relative abundance of individual species, they fail to capture the species-species correlations that we have shown (Seppi et al., Biomolecules 2023) to be more susceptible to the healthy state of the host. Our method thus focused on the interaction statistics among species, providing insights into underlying dynamics and stability of the microbiomes and their differences between healthy and unhealthy hosts.

      (2) Regarding model assumptions, we acknowledge that the weak interaction regime and symmetry hypotheses simplify the analysis and may not capture all empirical richness, such as fat-tailed distributions of species abundance. However, we interpret instability not as a path to chaos per se, but as a transition toward a multi-attractor phase, where each microbiome reaches a different fixed point. This is consistent with prior empirical findings invoking the “Anna Karenina principle”, where healthy microbiomes resemble one another, but disease states tend to deviate from this picture (see Pasqualini et al., PLOS Comp. Bio. 2024). We consider our framework as a starting point and agree that further extensions incorporating strong interaction regimes (as suggested by Mallmin et al., PNAS 2024) or relaxing other model assumptions could reveal even richer dynamical patterns. The computational pipeline we present can be, in fact, easily generalizable to include different population dynamics models.

      On the technical questions: (a) While compositional data constrain relative abundances, we can still estimate diversity-dependent parameters (h and q0) using alpha-diversity statistics across samples, which show meaningful variation; (b) The counter-intuitive instability that the reviewer pointed out arises from the interplay between demographic stochasticity and quenched disorder. It is the combined contribution of these two factors in phase space – not either one alone – that drives the transition. For clarity, see Figure 1 in Altieri et al., Phys. Rev. Lett. 2021; (c) We plan to include plots that compare empirical data to theoretical model fits. This will help visualize how well the model captures observed microbial community properties demographic noise (𝑇), healthy communities are more stable (i.e., distantσ from the and how even with larger species interaction heterogeneity (σ) and larger critical line), as measured, by the replicon eigenvalue. Finally, regarding interpretability and implications: by showing that ecological interaction networks – not just species identities – differ between healthy and unhealthy states, our work suggests a conceptual shift. This could inform medical strategies aimed at restoring community-level stability rather than targeting individual microbes. In the revised Discussion section, we will elaborate on this point to better highlight its practical implications and outline potential directions for future research.

      Reviewer #3:

      Strengths:

      The modeling efforts of this study primarily rely on a disordered form of the generalized Lotka-Volterra (gLV) model. This model can be appropriate for investigating certain systems, and the authors are clear about when and how more mechanistic models (i.e., consumer-resource) can lead to gLV. Phenomenological models such as this have been found to be highly useful for investigating the ecology of microbiomes, so this modeling choice seems justified, and the limitations are laid out. 

      Weaknesses:

      The authors use metagenomic data of diseased and healthy patients that were first processed in Pasqualini et al. (2024). The use of metagenomic data leads me to a question regarding the role of sampling effort (i.e., read counts) in shaping model parameters such as h. This parameter is equal to the average of 1/# species across samples because the data are compositional in nature. My understanding is that it was calculated using total abundances (i.e., read counts). The number of observed species is strongly influenced by sampling effort, so it would be useful if the number of reads were plotted against the number of species for healthy and diseased subjects. However, the role of sampling effort can depend on the type of data, and my instinct about the role that sampling effort plays in species detection is primarily based on 16S data. The dependency between these two variables may be less severe for the authors' metagenomic pipeline. This potential discrepancy raises a broader issue regarding the investigation of microbial macroecological patterns and the inference of ecological parameters. Often microbial macroecology researchers rely on 16S rRNA amplicon data because that type of data is abundant and comparatively low-cost. Some in microbiology and bioinformatics are increasingly pushing researchers to choose metagenomics over 16S. Sometimes this choice is valid (discovery of new MAGs, investigate allele frequency changes within species, etc.), sometimes it is driven by the false equivalence "more data = better". The outcome, though, is that we have a body of more-or-less established microbial macroecological patterns which rest on 16S data and are now slowly incorporating results from metagenomics. To my knowledge, there has not been a systematic evaluation of the macroecological patterns that do and do not vary by one's choice in 16S vs. metagenomics. Several of the authors in this manuscript have previously compared the MAD shape for 16S and metagenomic datasets in Pasqualini et al., but moving forward, a more comprehensive study seems necessary.

      We thank the reviewer for this insightful and nuanced comment, which particularly highlights the broader methodological context of our data sources. Indeed, metagenomic sequencing introduces different biases with respect to 16S data. First, we would like to emphasize that we estimated the order parameters from the data by using relative abundances. Second, while the concern regarding the influence of sequencing depth and species diversity on the estimation of the order parameters is valid, we refer to a previous publication by some of the authors (Pasqualini et al., 2024; see Figure 4, panels g and h). There, we pointed out that the observed outcome is weakly influenced by sequencing depth in our dataset, while the main impact on the order parameters estimate comes from the species diversity of the two groups. In the same publication, we showed that other well-known patterns (species abundance distribution, mean abundance distribution) are also observed. Also, to mitigate the effect of the number of samples and sequencing depth, we estimated the order parameters by a bootstrap procedure (90% of samples for healthy and diseased groups, 5000 resamples), which resulted in the error bars in Figure 2.

      We also fully agree with the broader call for a systematic comparison of macroecological patterns derived from 16S and metagenomic data. While some of us have already begun exploring this direction (e.g., Pasqualini et al., 2024), the reviewer’s comment highlights its significance and motivates us to pursue a more comprehensive, integrative analysis across data types. While we found qualitative agreement of these patterns with previous publications (e.g., Grilli, Nature Comm. 2020), we will acknowledge this as an important future direction in the Discussion section.

      References

      (1) Seppi, M., Pasqualini, J., Facchin, S., Savarino, E.V. and Suweis, S., 2023. Emergent functional organization of gut microbiomes in health and diseases. Biomolecules, 14(1), p.5.

      (2) Pasqualini, J., Facchin, S., Rinaldo, A., Maritan, A., Savarino, E. and Suweis, S., 2024. Emergent ecological patterns and modelling of gut microbiomes in health and in disease. PLOS Computational Biology, 20(9), p.e1012482.

      (3) Mallmin, E., Traulsen, A. and De Monte, S., 2024. Chaotic turnover of rare and abundant species in a strongly interacting model community. Proceedings of the National Academy of Sciences, 121(11), p.e2312822121.

      (4) Altieri, A., Roy, F., Cammarota, C., & Biroli, G. (2021). Properties of equilibria and glassy phases of the random Lotka-Volterra model with demographic noise. Physical Review Letters, 126(25), 258301.

      (5) Grilli, J. (2020). Macroecological laws describe variation and diversity in microbial communities. Nature communications, 11(1), 4743.

    1. Author response:

      Reviewer 1:

      (1) Clarification of axon mistargeting patterns and model interpretation

      We will clarify the apparent discrepancy between chick and mouse axon mistargeting data. Specifically, we will expand the explanation in the main text and Figure 7 legend and/or revise the model in Figure 7 to better reflect observed phenotypes and clarify how Sp1 overexpression contributes to mistargeting.

      (2) Evidence for Sp1-dependent ephrin expression

      We agree that demonstrating ephrin expression changes in motor neurons is essential. We will: • Conduct in situ hybridization and/or immunostaining for ephrins in control and Sp1 mutant spinal cords from both chick and mouse embryos.

      Clarify and expand the methodological details of the NSC-34 cell experiments shown in Figure 4G.

      (3) RNA-seq experiment details

      We will revise the Methods section to provide additional experimental details.

      (4) Use of Syn1-cre

      We acknowledge concerns about the broad expression of Syn1-cre. To address this:

      We will clarify our rationale for using Syn1-cre and describe its expression pattern in the spinal cord.

      We are evaluating the feasibility of additional experiments using a motor neuron-specific Cre driver to confirm cell-type specificity.

      We will include a new paragraph in the Discussion addressing potential contributions from other neuronal populations.

      Reviewer 2:

      (1) & (2) Clarification and localization of RNA-seq data

      We will expand the Methods section to provide greater detail on the RNA-seq approach. In addition, we will validate ephrin downregulation in LMC neurons using in situ hybridization and/or immunostaining.

      (3) Integration of ChIP and RNA-seq data We will:

      Report additional ChIP peaks for ephrinA5 and other differentially expressed genes such as Sema7a.

      Add a summary figure that integrates ChIP and RNA-seq results to strengthen the link between Sp1 binding and transcriptional regulation.

      (4) Clarification of the cis-attenuation model

      We recognize that our data do not yet directly demonstrate Sp1’s role in cis-attenuation. To address this:

      We will revise the abstract and main text to frame Sp1's role in cis-attenuation as a hypothesis. • We are exploring the feasibility of ephrinA5 and B2 rescue experiments in Sp1-deficient embryos to test specificity.

      (5) Behavioral phenotypes and cell-type specificity

      We will clarify that behavioral phenotypes may result from combined effects across neuron populations due to Syn1-cre expression. To address this:

      We are planning rescue experiments with Sp1 expression in chick embryos to test for rescue of axon misrouting.

      We will include a new paragraph in the Discussion to highlight this limitation and discuss alternative interpretations.

      Reviewer 3:

      We appreciate your positive evaluation and support for the rigor of our study.

      In response to your suggestions:

      We are revising the manuscript to improve clarity and flow, particularly the transitions between datasets.

      We will update Figure 7 and the associated text to more clearly convey the working model and avoid overinterpretation.

      We thank all reviewers for their constructive feedback and are committed to addressing each point thoroughly. All revisions will be clearly marked in the resubmitted manuscript.

    1. Author response:

      (This author response relates to the first round of peer review by Biophysics Colab. Reviews and responses to both rounds of review are available here: https://sciety.org/articles/activity/10.1101/2023.10.23.563601.)

      General Assessment:

      Pannexin (Panx) hemichannels are a family of heptameric membrane proteins that form pores in the plasma membrane through which ions and relatively large organic molecules can permeate. ATP release through Panx channels during the process of apoptosis is one established biological role of these proteins in the immune system, but they are widely expressed in many cells throughout the body, including the nervous system, and likely play many interesting and important roles that are yet to be defined. Although several structures have now been solved of different Panx subtypes from different species, their biophysical mechanisms remain poorly understood, including what physiological signals control their activation. Electrophysiological measurements of ionic currents flowing in response to Panx channel activation have shown that some subtypes can be activated by strong membrane depolarization or caspase cleavage of the C-terminus. Here, Henze and colleagues set out to identify endogenous activators of Panx channels, focusing on the Panx1 and Panx2 subtypes, by fractionating mouse liver extracts and screening for activation of Panx channels expressed in mammalian cells using whole-cell patch clamp recordings. The authors present a comprehensive examination with robust methodologies and supporting data that demonstrate that lysophospholipids (LPCs) directly Panx-1 and 2 channels. These methodologies include channel mutagenesis, electrophysiology, ATP release and fluorescence assays, molecular modelling, and cryogenic electron microscopy (cryo-EM). Mouse liver extracts were initially used to identify LPC activators, but the authors go on to individually evaluate many different types of LPCs to determine those that are more specific for Panx channel activation. Importantly, the enzymes that endogenously regulate the production of these LPCs were also assessed along with other by-products that were shown not to promote pannexin channel activation. In addition, the authors used synovial fluid from canine patients, which is enriched in LPCs, to highlight the importance of the findings in pathology. Overall, we think this is likely to be a landmark study because it provides strong evidence that LPCs can function as activators of Panx1 and Panx2 channels, linking two established mediators of inflammatory responses and opening an entirely new area for exploring the biological roles of Panx channels. Although the mechanism of LPC activation of Panx channels remains unresolved, this study provides an excellent foundation for future studies and importantly provides clinical relevance.

      We thank the reviewers for their time and effort in reviewing our manuscript. Based on their valuable comments and suggestions, we have made substantial revisions. The updated manuscript now includes two new experiments supporting that lysophospholipid-triggered channel activation promotes the release of signaling molecules critical for immune response and demonstrates that this novel class of agonist activates the inflammasome in human macrophages through endogenously expressed Panx1. To better highlight the significance of our findings, we have excluded the cryo-EM panel from this manuscript. We believe these changes address the main concerns raised by the reviewers and enhance the overall clarity and impact of our findings. Below, we provide a point-by-point response to each of the reviewers’ comments.

      Recommendations:

      (1) The authors present a tremendous amount of data using different approaches, cells and assays along with a written presentation that is quite abbreviated, which may make comprehension challenging for some readers. We would encourage the authors to expand the written presentation to more fully describe the experiments that were done and how the data were analysed so that the 2 key conclusions can be more fully appreciated by readers. A lot of data is also presented in supplemental figures that could be brought into the main figures and more thoroughly presented and discussed.

      We appreciate and agree with the reviewers’ observation. Our initial manuscript may have been challenging to follow due to our use of both wild-type and GS-tagged versions of Panx1 from human and frog origins, combined with different fluorescence techniques across cell types. In this revision, we used only human wild-type Panx1 expressed in HEK293S GnTI- cells, except for activity-guided fractionation experiments, where we used GS-tagged Panx1 expressed in HEK293 cells (Fig. 1). For functional reconstitution studies, we employed YO-PRO-1 uptake assays, as optimizing the Venus-based assay was challenging. We have clarified these exceptions in the main text. We think these adjustments simplify the narrative and ensure an appropriate balance between main and supplemental figures.

      (2) It would also be useful to present data on the ion selectivity of Panx channels activated by LPC. How does this compare to data obtained when the channel is activated by depolarization? If the two stimuli activate related open states then the ion selectivity may be quite similar, but perhaps not if the two stimuli activate different open states. The authors earlier work in eLife shows interesting shifts in reversal potentials (Vrev) when substituting external chloride with gluconate but not when substituting external sodium with N-methyl-D-glucamine, and these changed with mutations within the external pore of Panx channels. Related measurements comparing channels activated by LPC with membrane depolarization would be valuable for assessing whether similar or distinct open states are activated by LPC and voltage. It would be ideal to make Vrev measurements using a fixed step depolarization to open the channel and then various steps to more negative voltages to measure tail currents in pinpointing Vrev (a so called instantaneous IV).

      We fully agree with the reviewer on the importance of ion selectivity experiments. However, comparing the properties of LPC-activated channels with those activated by membrane depolarization presented technical challenges, as LPC appears to stimulate Panx1 in synergy with voltage. Prolonged LPC exposure destabilizes patches, complicating G-V curve acquisition and kinetic analyses. While such experiments could provide mechanistic insights, we think they are beyond the scope of current study.

      (3) Data is presented for expression of Panx channels in different cell types (HEK vs HEKS GnTI-) and different constructs (Panx1 vs Panx1-GS vs other engineered constructs). The authors have tried to be clear about what was done in each experiment, but it can be challenging for the reader to keep everything straight. The labelling in Fig 1E helps a lot, and we encourage the authors to use that approach systematically throughout. It would also help to clearly identify the cell type and channel construct whenever showing traces, like those in Fig 1D. Doing this systematically throughout all the figures would also make it clear where a control is missing. For example, if labelling for the type of cell was included in Fig 1D it would be immediately clear that a GnTI- vector alone control for WT Panx1 is missing as the vector control shown is for HEK cells and formally that is only a control for Panx2 and 3. Can the authors explain why PLC activates Panx1 overexpressed in HEK293 GnTl- cells but not in HEK293 cells? Is this purely a function of expression levels? If so, it would be good to provide that supporting information.

      As mentioned above, we believe our revised version is more straightforward to digest. We have improved labeling and provided explanations where necessary to clarify the manuscript. While Panx1 expression levels are indeed higher in GnTI- than in HEK293 cells, we are uncertain whether the absence of detectable currents in HEK293 cells is solely due to expression levels. Some post-translational modifications that inhibit Panx1, such as lysine acetylation, may also impact activity. Future studies are needed to explore these mechanisms further.

      (4) The mVenus quenching experiments are somewhat confusing in the way data are presented. In Fig 2B the y axis is labelled fluorescence (%) but when the channel is closed at time = 0 the value of fluorescence is 0 rather than 100 %, and as the channel opens when LPC is added the values grow towards 100 instead of towards 0 as iodide permeates and quenches. It would be helpful if these types of data could be presented more intuitively. Also, how was the initial rate calculated that is plotted in Fig 2C? It would be helpful to show how this is done in a figure panel somewhere. Why was the initial rate expressed as a percent maximum, what is the maximum and why are the values so low? Why is the effect of CBX so weak in these quenching experiments with Panx1 compared to other assays? This assay is used in a lot of experiments so anything that could be done to bolster confidence is what it reports on would be valuable to readers. Bringing in as many control experiments that have been done, including any that are already published, would be helpful.

      We modified the Y-axis in Figure 2 to “Quench (%)” for clarity. The data reflects fluorescence reduction over time, starting from LPC addition, normalized to the maximal decrease observed after Triton-X100 addition (3 minutes), enabling consistent quenching value comparisons. Although the quenching value appears small, normalization against complete cell solubilization provides reproducible comparisons. We do not fully understand why CBX effects vary in Venus quenching experiments, but we speculate that its steroid-like pentacyclic structure may influence the lysophospholipid agonistic effects. As noted in prior studies (DOI: 10.1085/jgp.201511505; DOI: 10.7554/eLife.54670), CBX likely acts as an allosteric modulator rather than a simple pore blocker, potentially contributing to these variations.

      (5) Could provide more information to help rationalize how Yo-Pro-1, which has a charge of +2, can permeate what are thought to be anion favouring Panx channels? We appreciate that the biophysical properties of Panx channel remain mysterious, but it would help to hear how a bit more about the authors thinking. It might also help to cite other papers that have measured Yo-Pro-1 uptake through Panx channels. Was the Strep-tagged construct of Panx1 expressed in GnTI- cells and shown to be functional using electrophysiology?

      Our recent study suggest that the electrostatic landscape along the permeation pathway may influence its ion selectivity (DOI: 10.1101/2024.06.13.598903). However, we have not yet fully elucidated how Panx1 permeates both anions and cations. Based on our findings, ion selectivity may vary with activation stimulus intensity and duration. Cation permeation through Panx1 is often demonstrated with YO-PRO-1, which measures uptake over minutes, unlike electrophysiological measurements conducted over milliseconds to seconds. We referenced two representative studies employing YO-PRO-1 to assess Panx1 activity. Whole-cell current measurements from a similar construct with an intracellular loop insertion indicate that our STREP-tagged construct likely retains functional capacity.

      (6) In Fig 5 panel C, data is presented as the ratio of LPC induced current at -60 mV to that measured at +110 mV in the absence of LPC. What is the rationale for analysing the data this way? It would be helpful to also plot the two values separately for all of the constructs presented so the reader can see whether any of the mutants disproportionately alter LPC induced current relative to depolarization activated current. Also, for all currents shown in the figures, the authors should include a dashed coloured line at zero current, both for the LPC activated currents and the voltage steps.

      We used the ratio of LPC-induced current to the current measured at +110 mV to determine whether any of the mutants disproportionately affect LPC-induced current relative to depolarization-activated current. Since the mutants that did not respond to LPC also exhibited smaller voltage-stimulated currents than those that did respond, we reasoned that using this ratio would better capture the information the reviewer is suggesting to gauge. Showing the zero current level may be helpful if the goal was to compare basal currents, which in our experience vary significantly from patch to patch. However, since we are comparing LPC- and voltage-induced currents within the same patch, we believe that including basal current measurements would not add useful information to our study.

      Given that new experiments included to further highlight the significance of the discovery of Panx1 agonists, we opted to separate structure-based mechanistic studies from this manuscript and removed this experiment along with the docking and cryo-EM studies.

      (7) The fragmented NTD density shown in Fig S8 panel A may resemble either lipid density or the average density of both NTD and lipid. For example, Class7 and Class8 in Fig.S8 panel D displayed split densities, which may resemble a phosphate head group and two tails of lipid. A protomer mask may not be the ideal approach to separate different classes of NTD because as shown in Fig S8 panel D, most high-resolution features are located on TM1-4, suggesting that the classification was focused on TM1-4. A more suitable approach would involve using a smaller mask including NTD, TM1, and the neighbouring TM2 region to separate different NTD classes.

      We agree with the reviewer and attempted 3D classification using multiple smaller masks including the suggested region. However, the maps remained poorly defined, and we were unable to confidently assign the NTD.

      (8) The authors don’t discuss whether the LPC-bound structures display changes in the external part of the pore, which is the anion-selective filter and the narrower part of the pore. If there are no conformational changes there, then the present structures cannot explain permeability to large molecules like ATP. In this context, a plot for the pore dimension will be helpful to see differences along the pore between their different structures. It would also be clearer if the authors overlaid maps of protomers to illustrate differences at the NTD and the "selectivity filter."

      Both maps show that the narrowest constriction, formed by W74, has a diameter of approximately 9 Å. Previous steered molecular dynamics simulations suggest that ATP can permeate through such a constriction, implying an ion selection mechanism distinct from a simple steric barrier.

      (9) The time between the addition of LPC to the nanodisc-reconstituted protein and grid preparation is not mentioned. Dynamic diffusion of LPC could result in equal probabilities for the bound and unbound forms. This raises the possibility of finding the Primed state in the LPC-bound state as well. Additionally, can the authors rationalize how LPC might reach the pore region when the channel is in the closed state before the application of LPC?

      We appreciate the reviewer’s insight. We incubated LPC and nanodisc-reconstituted protein for 30 minutes, speculating that LPC approaches the pore similarly to other lipids in prior structures. In separate studies, we are optimizing conditions to capture more defined conformations.

      (10) In the cryo-EM map of the “resting” state (EMDB-21150), a part of the density was interpreted as NTD flipped to the intracellular side. This density, however, is poorly defined, and not connected to the S1 helix, raising concerns about whether this density corresponds to the NTD as seen in the “resting” state structure (PDB-ID: 6VD7). In addition, some residues in the C-terminus (after K333 in frog PANX1) are missing from the atomic model. Some of these residues are predicted by AlphaFold2 to form a short alpha helix and are shown to form a short alpha helix in some published PANX1 structures. Interestingly, in both the AF2 model and 6WBF, this short alpha helix is located approximately in the weak density that the authors suggest represents the “flipped” NTD. We encourage the authors to be cautious in interpreting this part as the “flipped” NTD without further validation or justification.

      We agree that the density corresponding the extended NTD into the cytoplasm is relatively weak. In our recent study, we compared two Panx1 structures with or without the mentioned C-terminal helix and found evidence suggesting the likelihood of NTD extension (DOI: 10.1101/2024.06.13.598903). Nevertheless, to prevent potential confusion, we have removed the cryo-EM panel from this manuscript.

      (11) Since the authors did not observe densities of bound PLC in the cryo-EM map, it is important to acknowledge in the text the inherent limitations of using docking and mutagenesis methods to locate where PLC binds.

      Thank you for the suggestion. We have removed this section to avoid potential confusion.

      Optional suggestions:

      (1) The authors used MeOH to extract mouse liver for reversed-phase chromatography. Was the study designed to focus on hydrophobic compounds that likely bind to the TMD? Panx1 has both ECD and ICD with substantial sizes that could interact with water soluble compounds? Also, the use of whole-cell recordings to screen fractions would not likely identify polar compounds that interact with the cytoplasmic part of the TMD? It would be useful for the authors to comment on these aspects of their screen and provide their rationale for fractionating liver rather than other tissues.

      We have added a rationale in line 90, stating: “The soluble fractions were excluded from this study, as the most polar fraction induced strong channel activities in the absence of exogenously expressed pannexins.” Additionally, we have included a figure to support this rationale (Fig. S1A).

      (2) The authors show that LPCs reversibly increase inward currents at a holding voltage of -60 mV (not always specified in legends) in cells expressing Panx1 and 2, and then show families of currents activated by depolarizing voltage steps in the absence of LPC without asking what happens when you depolarize the membrane after LPC activation? If LPCs can be applied for long enough without disrupting recordings, it would be valuable to obtain both I-V relations and G-V relations before and after LPC activation of Panx channels. Does LPC disproportionately increase current at some voltages compared to others? Is the outward rectification reduced by LPC? Does Vrev remain unchanged (see point above)? Its hard to predict what would be observed, but almost any outcome from these experiments would suggest additional experiments to explore the extent to which the open states activated by LPC and depolarization are similar or distinct.

      Unfortunately, in our hands, the prolonged application of lysolipids at concentrations necessary to achieve significant currents tends to destabilize the patch. This makes it challenging to obtain G-V curves or perform the previously mentioned kinetic analyses. We believe this destabilization may be due to lysolipids’ surfactant-like qualities, which can disrupt the giga seal. Additionally, prolonged exposure seems to cause channel desensitization, which could be another confounding factor.

      (3) From the results presented, the authors cannot rule out that mutagenesis-induced insensitivity of Panx channels to LPCs results from allosteric perturbations in the channels rather than direct binding/gating by LPCs. In Fig 5 panel A-C, the authors introduced double mutants on TM1 and TM2 to interfere with LPC binding, however, the double mutants may also disrupt the interaction network formed within NTD, TM1, and TM2. This disruption could potentially rearrange the conformation of NTD, favouring the resting closed state. Three double Asn mutants, which abolished LPC induced current, also exhibited lower currents through voltage activation in Fig 5S, raising the possibility the mutant channels fail to activate in response to LPC due to an increased energy barrier. One way to gain further insight would be to mutate residues in NTD that interact with those substituted by the three double Asn mutants and to measuring currents from both voltage activation and LPC activation. Such results might help to elucidate whether the three double Asn mutants interfere with LPC binding. It would also be important to show that the voltage-activated currents in Fig. S5 are sensitive to CBX?

      Thank you for the comment, with which we agree. Our initial intention was to use the mutagenesis studies to experimentally support the docking study. Due to uncertainties associated with the presented cryo-EM maps, we have decided to remove this study from the current manuscript. We will consider the proposed experiments in a future study.

      (4) Could the authors elaborate on how LPC opens Panx1 by altering the conformation of the NTDs in an uncoordinated manner, going from “primed” state to the “active” state. In the “primed” state, the NTDs seem to be ordered by forming interactions with the TMD, thus resulting in the largest (possible?) pore size around the NTDs. In contrast, in the “active” state, the authors suggest that the NTDs are fragmented as a result of uncoordinated rearrangement, which conceivably will lead to a reduction in pore size around NTDs (isn’t it?). It is therefore not intuitive to understand why a conformation with a smaller pore size represents an “active” state.

      We believe the uncoordinated arrangement of NTDs is dynamic, allowing for potential variations in pore size during the activated conformation. Alternatively, NTD movement may be coupled with conformational changes in TM1 and the extracellular domain, which in turn could alter the electrostatic properties of the permeation pathway. We believe a functional study exploring this mechanism would be more appropriately presented as a separate study.

      (5) Can the authors provide a positive control for these negative results presented in Fig S1B and C?

      The positive results are presented in Fig. 1D and E.

      (6) Raw images in Fig S6 and Fig S7 should contain units of measurement.

      Thank you for pointing this out.

      (7) It may be beneficial to show the superposition between primed state and activated state in both protomer and overall structure. In addition, superposition between primed state and PDB 7F8J.

      We attempted to superimpose the cryo-EM maps; however, visually highlighting the differences in figure format proved challenging. Higher-resolution maps would allow for model building, which would more effectively convey these distinctions.

      (8) Including particles number in each class in Fig S8 panel C and D would help in evaluating the quality of classification.

      Noted.

      (9) A table for cryo-EM statistics should be included.

      Thanks, noted.

      (10) n values are often provided as a range within legends but it would be better to provide individual values for each dataset. In many figures you can see most of the data points, which is great, but it would be easy to add n values to the plots themselves, perhaps in parentheses above the data points.

      While we agree that transparency is essential, adding n-values to each graph would make some figures less clear and potentially harder to interpret in this case. We believe that the dot plots, n-value range, and statistical analysis provide adequate support for our claims.

      (11) The way caspase activation of Panx channels is presented in the introduction could be viewed as dismissive or inflammatory for those who have studied that mechanism. We think the caspase activation literature is quite convincing and there is no need to be dismissive when pointing out that there are good reasons to believe that other mechanisms of activation likely exist. We encourage you to revise the introduction accordingly.

      Thank you for this comment. Although we intended to support the caspase activation mechanism in our introduction, we understand that the reviewer’s interpretation indicates a need for clarification. We hope the revised introduction removes any perception of dismissiveness.

      (12) Why is the patient data in Fig 4F normalized differently than everything else? Once the above issues with mVenus quenching data are clarified, it would be good to be systematic and use the same approach here.

      For Fig. 4F, we used a distinct normalization method to account for substantial day-to-day variation in experiments involving body fluids. Notably, we did not apply this normalization to other experimental panels due to their considerably lower day-to-day variation.

      (13) What was the rational for using the structure from ref 35 in the docking task?

      The docking task utilized the human orthologue with a flipped-up NTD. We believe that this flipped-up conformation is likely the active form that responds to lysolipids. As our functional experiments primarily use the human orthologue for biological relevance, this structure choice is consistent. Our docking data shows that LPC does not dock at this site when using a construct with the downward-flipped NTD.

      (14) Perhaps better to refer to double Asn ‘substitutions’ rather than as ‘mutations’ because that makes one think they are Asn in the wt protein.

      Done.

      (15) From Fig S1, we gather that Panx2 is much larger than Panx1 and 3. If that is the case, its worth noting that to readers somewhere.

      We have added the molecular weight of each subtype in the figure legend.

      (16) Please provide holding voltages and zero current levels in all figures presenting currents.

      We provided holding voltages. However, the zero current levels vary among the examples presented, making direct comparisons difficult. Since we are comparing currents with and without LPC, we believe that indicating zero current levels is unnecessary for this study.

      (17) While the authors successfully establish lysophospholipid-gating of Panx1 and Panx2, Panx3 appears unaffected. It may be advisable to be more specific in the title of the article.

      We are uncertain whether Panx3 is unaffected by lysophospholipids, as we have not observed activation of this subtype under any tested conditions.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This a comprehensive study that sheds light on how Wag31 functions and localises in mycobacterial cells. A clear link to interactions with CL is shown using a combination of microscopy in combination with fusion fluorescent constructs, and lipid specific dyes. Furthermore, studies using mutant versions of Wag31 shed light on the functionalities of each domain in the protein. My concerns/suggestions for the manuscript are minor:

      (1) Ln 130. A better clarification/discussion is required here. It is clear that both depletion and overexpression have an effect on levels of various lipids, but subsequent descriptions show that they affect different classes of lipids.

      We thank the reviewer for the comment. We have added a better clarification on this in the discussion of revised manuscript. The lipid classes that get impacted by the depletion of Wag31 vs overexpression are different. Wag31 is an adaptor protein that interacts with proteins of the ACCase complex (Meniche et al., 2014; Xu et al., 2014) that synthesize fatty acid precursors and regulate their activity (Habibi Arejan et al., 2022).

      The varied response on lipid homeostasis could be attributed to a change in the stoichiometry of these interactions of Wag31. While Wag31 depletion would prevent such interactions from occurring and might affect lipid synthesis that directly depends on Wag31-protein partner interactions, its overexpression would lead to promiscuous interactions and a change in the stoichiometry of native interactions that would ultimately modulate lipid synthesis pathways.

      (2) The pulldown assays results are interesting, but links are tentative.

      We thank the reviewer for the comment. The interactome of Wag31 was identified through the immunoprecipitation of FLAG-Wag31 complemented at an integrative locus in Wag31 mutant background to avoid overexpression artifacts. We used Msm::gfp expressing an integrative copy (at L5 locus) of FLAG-GFP as a control to subtract non-specific interactions. The experiment was performed in biological triplicates, and interactors that appeared in all replicates but not in the control were selected for further analysis. Although we identified more than 100 interactors of Wag31, we analyzed only the top 25 hits, with a PSM cut-off 18 and unique peptides5. Additionally, two of Wag31's established interactors, AccD5 and Rne, were among the top five hits, thus validating our data.

      As mentioned in line 139 of the previous version of the manuscript, we agree that the interactions can either be direct or through a third partner. The fact that we obtained known interactors of Wag31 makes us believe these interactions are genuine. Moreover, for validation, we performed pulldown experiments by mixing E. coli lysates expressing His-Wag31 full-length or truncated protein with M. smegmatis lysates expressing FLAG-tagged interacting proteins. The wash conditions used were quite stringent for these pull-down assays—the wash buffer contained 1% Triton X100 that eliminates all non-specific and indirect interactions. However, we agree that we cannot conclusively state that the interactions are direct without purifying the proteins and performing the experiment. As mentioned above, this caveat was stated in the previous version of the manuscript.

      (3) The authors may perhaps like to rephrase claims of effects lipid homeostasis, as my understanding is that lipid localisation rather than catabolism/breakdown is affected.

      We thank the reviewer for the comment. In this manuscript, we are trying to convey that Wag31 is a spatiotemporal regulator of lipid metabolism. It is a peripheral protein that is hooked to the membrane via Cardiolipin and forms a scaffold at the poles, which helps localize several enzymes involved in lipid metabolism.

      Homeostasis is the process by which an organism maintains a steady-state of balance and stability in response to changes. Depletion of Wag31 not only results in delocalisation of lipids in intracellular lipid inclusions but also leads to changes in the levels of various lipid classes. Advancement in the field of spatial biology underscores the importance of native localization of various biological molecules crucial for maintaining a steady-cell of the cell. Hence, we have used the word “homeostasis” to describe both the changes observed in lipid metabolism.

      Reviewer #2 (Public review):

      Summary:

      Kapoor et. al. investigated the role of the mycobacterial protein Wag31 in lipid and peptidoglycan synthesis and sought to delineate the role of the N- and C- terminal domains of Wag31. They demonstrated that modulating Wag31 levels influences lipid homeostasis in M. smegmatis and cardiolipin (CL) localisation in cells. Wag31 was found to preferentially bind CL-containing liposomes, and deleting the N-terminus of the protein significantly decreased this interaction. Novel interactions between Wag31 and proteins involved in lipid metabolism and cell wall synthesis were identified, suggesting that Wag31 recruits proteins to the intracellular membrane domain by direct interaction.

      Strengths:

      (1) The importance of Wag31 in maintaining lipid homeostasis is supported by several lines of evidence. (2) The interaction between Wag31 and cardiolipin, and the role of the N-terminus in this interaction was convincingly demonstrated.

      Weaknesses:

      (1) MS experiments provide some evidence for novel protein-protein interactions. However, the pulldown experiments lack a valid negative control.

      We thank the reviewer for the comment. We have included two non-interactors of Wag31 i.e. MmpL4 and MmpS5 which were not identified in our interactome database as negative controls in the experiment. As shown in Figure S3, we performed His pull-down experiments with both of them independently twice, each time with a positive control (known interactor of Wag31 (Msm2092)). Fig. S3b revised shows E. coli lysate expressing His-Wag31 which was incubated with Msm lysates expressing either FLAG tagged-MmpL4 or -MmpS5 or Msm2092 (revised Fig. S3c). The mixed lysates were pulled down with Cobalt beads that bind to the His-tagged protein and analysed using Western blot analysis by probing with anti-FLAG antibody (revised Fig. S3d.). The data presented confirms that the interactions validated through the pull down assay were indeed specific.

      (2) The role of the N-terminus in the protein-protein interaction has not been ruled out.

      We thank the reviewer for the comment. Wag31<sub>Msm</sub> is a 272 amino acids long protein. The Nterminal of Wag31, which houses the DivIVA-domain, comprises the first 60 amino acids. Previously, we attempted to express the N-terminal (60 aa long) and the C-terminal (212 aa long) truncated proteins in various mycobacterial shuttle vectors to perform MS/MS experiments. Despite numerous efforts, neither expressed with the N/C-terminal FLAG tag or no tag in episomal or integrative vectors due to instability of the protein. Eventually, we successfully expressed the C-terminal Wag31 with an N and Cterminal hexa-His tag. However, this expression was not sufficient or stable enough for us to perform Ni<sup>2+</sup>-affinity pull-down experiments for mass spectrometry. N-terminal of Wag31 could not be expressed in M. smegmatis even with N and C-terminal Hexa-His tags.

      To rule out the role of the N-terminal in mediating protein-protein interactions, we cloned the N-terminal of Wag31 that comprises the DivIVA-domain in pET28b vector (Fig. 7a revised). Subsequently, the truncated protein, hereafter called  Wag31<sub>∆C</sub>  flanked by 6X His tags at both the termini was expressed in E. coli and mixed with Msm lysates expressing interactors of Wag31 (Fig. 7b-c revised). Earlier experiments with Wag31<sub>∆1-60</sub or Wag31<sub>∆N</sub> (in the revised manuscript) were performed with MurG, SepIVA, Msm2092 and AccA3 (Fig. 7e-g). Thus, we used the same set of interactors to test our hypothesis. Briefly, His-  Wag31<sub>∆C</sub>  was mixed with Msm lysates expressing either FLAG-MurG, -SepIVA, -Msm2092 or -AccA3 and pull down experiments were performed as described previously. FLAGMmpS5, a non-interactor of Wag31 was used as a negative control. As shown in Fig. 7d revised, His-Wag31 could bind to all the four interactors whereas His- Wag31<sub>∆C</sub>  couldn’t, strengthening the conclusion that interactions of Wag31 with other proteins are mediated by its Cterminal. However, we can’t ignore the possibility of other interactors binding to the N-terminal of Wag31. Unfortunately, due to poor expression/instability of  Wag31<sub>∆C</sub>  in mycobacterial shuttle vectors, we are unable to perform a global interactome analysis of  Wag31<sub>∆C</sub>

      Reviewer #3 (Public review):

      Summary:

      This manuscript describes the characterization of mycobacterial cytoskeleton protein Wag31, examining its role in orchestrating protein-lipid and protein-protein interactions essential for mycobacterial survival. The most significant finding is that Wag31, which directs polar elongation and maintains the intracellular membrane domain, was revealed to have membrane tethering capabilities.

      Strengths:

      The authors provided a detailed analysis of Wag31 domain architecture, revealing distinct functional roles: the N-terminal domain facilitates lipid binding and membrane tethering, while the C-terminal domain mediates protein-protein interactions. Overall, this study offers a robust and new understanding of Wag31 function.

      Weaknesses:

      The following major concerns should be addressed.

      • Authors use 10-N-Nonyl-acridine orange (NAO) as a marker for cardiolipin localization. However, given that NAO is known to bind to various anionic phospholipids, how do the authors know that what they are seeing is specifically visualizing cardiolipin and not a different anionic phospholipid? For example, phosphatidylinositol is another abundant anionic phospholipid in mycobacterial plasma membrane.

      We thank the reviewer for the comment. Despite its promiscuous binding to other anionic phospholipids, 10-N-Nonyl-acridine orange is widely used to stain Cardiolipin and determine its localisation in bacterial cells and mitochondria of eukaryotes (Garcia Fernandez et al., 2004; Mileykovskaya & Dowhan, 2000; Renner & Weibel, 2011). This is because it has a stronger affinity for Cardiolipin than other anionic phospholipids with the affinity constant being 2 × 10<sup>6</sup> M−<sup>1</sup> for Cardiolipin association and 7 × 10<sup>4</sup> M−<sup>1</sup> for that of phosphatidylserine and phosphatidylinositol association (Petit et al., 1992). Additionally, there is not yet another stain available for detecting Cardiolipin. Our proteinlipid binding assays suggest that Wag31 preferentially binds to Cardiolipin over other anionic phospholipids (Fig. 4b), hence it is likely that the majority of redistribution of NAO fluorescence that we observe might be contributed by Cardiolipin mislocalization due to altered Wag31 levels, with smaller degree of NAO redistribution intensity coming indirectly from other anionic phospholipids displaced from the membrane due to the loss of membrane integrity and cell shape changes due to Wag31.

      • Authors' data show that the N-terminal region of Wag31 is important for membrane tethering. The authors' data also show that the N-terminal region is important for sustaining mycobacterial morphology. However, the authors' statement in Line 256 "These results highlight the importance of tethering for sustaining mycobacterial morphology and survival" requires additional proof. It remains possible that the N-terminal region has another unknown activity, and this yet-unknown activity rather than the membrane tethering activity drives the morphological maintenance. Similarly, the N-terminal region is important for lipid homeostasis, but the statement in Line 270, "the maintenance of lipid homeostasis by Wag31 is a consequence of its tethering activity" requires additional proof. The authors should tone down these overstatements or provide additional data to support their claims.

      We agree with the reviewer that there exists a possibility for another function of the N-terminal that may contribute to sustaining mycobacterial physiology and survival. We would revise our statements in the paper to reflect the data. Results shown suggest that the tethering activity of the Nterminal region may contribute to mycobacterial morphology and survival. However, additional functions of this region can’t be ruled out. Similarly, the maintenance of lipid homeostasis by Wag31 may be associated with its tethering activity, although other mechanisms could also contribute to this process.

      • Authors suggest that Wag31 acts as a scaffold for the IMD (Fig. 8). However, Meniche et. al. has shown that MurG as well as GlfT2, two well-characterized IMD proteins, do not colocalize with Wag31 (DivIVA) (https://doi.org/10.1073/pnas.1402158111). IMD proteins are always slightly subpolar while Wag31 is located to the tip of the cell. Therefore, the authors' biochemical data cannot be easily reconciled with microscopic observations in the literature. This raises a question regarding the validity of protein-protein interaction shown in Figure 7. Since this pull-down assay was conducted by mixing E. coli lysate expressing Wag31 and Msm lysate expression Wag31 interactors like MurG, it is possible that the interactions are not direct. Authors should interpret their data more cautiously. If authors cannot provide additional data and sufficient justifications, they should avoid proposing a confusing model like Figure 8 that contradicts published observations.

      In the literature, MurG and GlfT2 have been shown to have polar localisation (Freeman et al., 2023; Hayashi et al., 2016; Kado et al., 2023) and two groups have shown slightly sub-polar localisation of MurG (García-Heredia et al., 2021; Meniche et al., 2014). Additionally, (Freeman et al., 2023) showed SepIVA to be a spatio-temporal regulator of MurG. MS/MS analysis of Wag31 immunoprecipitation data yielded both MurG and SepIVA to be interactors of Wag31 (Fig. 3). Given Wag31 also displays polar localisation, it is likely that it associates with the polar MurG. However, since a sub-polar localisation of MurG has also been reported, it is possible that they do not interact directly and another protein mediates their interaction. Based on the above, we will modify the model proposed in Fig. 8.

      We agree that for validation of interaction, we performed pulldown experiments by mixing E. coli lysates expressing His-Wag31 full-length or truncated protein with M. smegmatis lysates expressing FLAG-tagged interacting proteins. The wash conditions used were quite stringent for these pull-down assays—the wash buffer contained 1% Triton X100 that eliminates all non-specific and indirect interactions. However, we agree that we cannot conclusively state that the interactions are direct without purifying the proteins and performing the experiment. We will describe this caveat in the revised manuscript and propose a model that reflects the results we obtained.

      References:

      Freeman, A. H., Tembiwa, K., Brenner, J. R., Chase, M. R., Fortune, S. M., Morita, Y. S., & Boutte, C. C. (2023). Arginine methylation sites on SepIVA help balance elongation and septation in Mycobacterium smegmatis. Mol Microbiol, 119(2), 208-223. https://doi.org/10.1111/mmi.15006

      Garcia Fernandez, M. I., Ceccarelli, D., & Muscatello, U. (2004). Use of the fluorescent dye 10-N-nonyl acridine orange in quantitative and location assays of cardiolipin: a study on different experimental models. Anal Biochem, 328(2), 174-180. https://doi.org/10.1016/j.ab.2004.01.020

      García-Heredia, A., Kado, T., Sein, C. E., Puffal, J., Osman, S. H., Judd, J., Gray, T. A., Morita, Y. S., & Siegrist, M. S. (2021). Membrane-partitioned cell wall synthesis in mycobacteria. eLife, 10. https://doi.org/10.7554/eLife.60263

      Habibi Arejan, N., Ensinck, D., Diacovich, L., Patel, P. B., Quintanilla, S. Y., Emami Saleh, A., Gramajo, H., & Boutte, C. C. (2022). Polar protein Wag31 both activates and inhibits cell wall metabolism at the poles and septum. Front Microbiol, 13, 1085918. https://doi.org/10.3389/fmicb.2022.1085918

      Hayashi, J. M., Luo, C. Y., Mayfield, J. A., Hsu, T., Fukuda, T., Walfield, A. L., Giffen, S. R., Leszyk, J. D., Baer, C. E., Bennion, O. T., Madduri, A., Shaffer, S. A., Aldridge, B. B., Sassetti, C. M., Sandler, S. J., Kinoshita, T., Moody, D. B., & Morita, Y. S. (2016). Spatially distinct and metabolically active membrane domain in mycobacteria. Proc Natl Acad Sci U S A, 113(19), 5400-5405. https://doi.org/10.1073/pnas.1525165113

      Kado, T., Akbary, Z., Motooka, D., Sparks, I. L., Melzer, E. S., Nakamura, S., Rojas, E. R., Morita, Y. S., & Siegrist, M. S. (2023). A cell wall synthase accelerates plasma membrane partitioning in mycobacteria. eLife, 12, e81924. https://doi.org/10.7554/eLife.81924

      Meniche, X., Otten, R., Siegrist, M. S., Baer, C. E., Murphy, K. C., Bertozzi, C. R., & Sassetti, C. M. (2014). Subpolar addition of new cell wall is directed by DivIVA in mycobacteria. Proc Natl Acad Sci U S A, 111(31), E32433251. https://doi.org/10.1073/pnas.1402158111

      Mileykovskaya, E., & Dowhan, W. (2000). Visualization of phospholipid domains in Escherichia coli by using the cardiolipin-specific fluorescent dye 10-N-nonyl acridine orange. J Bacteriol, 182(4), 1172-1175. https://doi.org/10.1128/JB.182.4.1172-1175.2000

      Petit, J. M., Maftah, A., Ratinaud, M. H., & Julien, R. (1992). 10N-nonyl acridine orange interacts with cardiolipin and allows the quantification of this phospholipid in isolated mitochondria. Eur J Biochem, 209(1), 267273. https://doi.org/10.1111/j.1432-1033.1992.tb17285.x

      Renner, L. D., & Weibel, D. B. (2011). Cardiolipin microdomains localize to negatively curved regions of Escherichia coli membranes. Proc Natl Acad Sci U S A, 108(15), 6264-6269. https://doi.org/10.1073/pnas.1015757108

      Schägger, H. (2006). Tricine-SDS-PAGE. Nat Protoc, 1(1), 16-22. https://doi.org/10.1038/nprot.2006.4

      Xu, W. X., Zhang, L., Mai, J. T., Peng, R. C., Yang, E. Z., Peng, C., & Wang, H. H. (2014). The Wag31 protein interacts with AccA3 and coordinates cell wall lipid permeability and lipophilic drug resistance in Mycobacterium smegmatis. Biochem Biophys Res Commun, 448(3), 255-260. https://doi.org/10.1016/j.bbrc.2014.04.116

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Ln 130. A better clarification/discussion is required here. It is clear that both depletion and overexpression have an effect in levels of various lipids, but subsequent descriptions show that they affect different classes of lipids.

      We thank the reviewer for the comment. We have included a clarification for this in the discussion section.

      (2) The pulldown assays results are interesting, but the links are tentative.

      We thank the reviewer for the comment. The interactome of Wag31 was identified through the immunoprecipitation of Flag-tagged Wag31 complemented at an integrative locus in Wag31 mutant background to avoid overexpression artifacts. We used Msm::gfp expressing an integrative copy (at L5 locus) of FLAG-GFP as a control to subtract non-specific interactions. The experiment was performed in biological triplicates, and interactors that appeared in all replicates were selected for further analysis. Although we identified more than 100 interactors of Wag31, we analyzed only the top 25 hits, with a PSM cut-off 18 and unique peptides5. Additionally, two of Wag31's established interactors, AccD5 and Rne, were among the top five hits, thus validating our data.

      Though we agree that the interactions can either be direct or through a third partner, the fact that we obtained known interactors of Wag31 makes us believe these interactions are genuine. Moreover, for validation, we performed pulldown experiments by mixing E. coli lysates expressing HisWag31 full-length or truncated protein with M. smegmatis lysates expressing FLAG-tagged interacting proteins. The wash conditions used were quite stringent for these pull-down assays—the wash buffer contained 1% Triton X100 that eliminates all non-specific and indirect interactions. However, we agree that we cannot conclusively state that the interactions are direct without purifying the proteins and performing the experiment. We will describe this caveat in the revised manuscript.

      (3) The authors may perhaps like to rephrase claims of effects lipid homeostasis, as my understanding is that lipid localisation rather than catabolism/breakdown is affected.

      We thank the reviewer for the comment. In this manuscript, we are trying to convey that Wag31 is a spatiotemporal regulator of lipid metabolism. It is a peripheral protein that is hooked to the membrane via Cardiolipin and forms a scaffold at the poles, which helps localize several enzymes involved in lipid metabolism.

      Homeostasis is the process by which an organism maintains a steady-state of balance and stability in response to changes. Depletion of Wag31 not only results in delocalisation of lipids in intracellular lipid inclusions but also leads to changes in the levels of various lipid classes. Advancement in the field of spatial biology underscores the importance of native localization of various biological molecules crucial for maintaining a steady-cell of the cell. Hence, we have used the word “homeostasis” to describe both the changes observed in lipid metabolism.

      Reviewer #2 (Recommendations for the authors):

      I recommend the following experiments to strengthen the data presented:

      (1) Include a non-interacting FLAG-tagged protein as a negative control in the pull-down experiment to strengthen this data.

      We thank the reviewer for the comment. As suggested, we have included non-interacting FLAGtagged proteins as negative controls in the pulldown experiment. We chose MmpL4 and MmpS5 which were not found in the Wag31 interactome data. We performed pull-down experiments with both of them and included an interactor of Wag31 i.e. Msm2092 as a positive control. Fig. S3b revised shows E. coli lysate expressing His-Wag31 which was incubated with Msm lysates expressing either FLAG taggedMmpL4 or -MmpS5 or -Msm2092 (Fig. S3c revised). The mixed lysates were pulled down with Cobalt beads that bind to the His-tagged protein and analysed using Western blot analysis by probing with anti-FLAG antibody. The pull down experiments were performed independently twice, every time with Msm2092 as the positive control (Fig. S3d. revised).

      (2) Perform the pull-down experiments using only the Wag31 N-terminus to rule out any role that it may have in the protein-protein interactions.

      We thank the reviewer for the comment. To rule out the possibility of N-terminal of Wag31 in mediating protein-protein interactions, we cloned the N-terminal of Wag31 that comprises the DivIVAdomain in pET28b vector (Fig. 7a revised). Subsequently, the truncated protein, hereafter called Wag31<sub>∆C</sub> flanked by 6X His tags at both the termini was expressed in E. coli and subsequently mixed with Msm lysates expressing interactors of Wag31 (Fig. 7b-c revised). Earlier experiments with Wag31<sub>∆1-60</sub> or Wag31<sub>∆N</sub>  were performed with MurG, SepIVA, Msm2092 and AccA3 (Fig. 7 previous) so we used the same set of interactors to test our hypothesis. Briefly, His-Wag31<sub>∆C</sub>was mixed with Msm lysates expressing either FLAG-MurG, -SepIVA, -Msm2092 or -AccA3 and pull down experiments were performed as described previously. FLAG-MmpS5, a non-interactor of Wag31 was used as a negative control. As shown in Fig. 7d revised, His-Wag31 could bind to all the four interactors whereas His-Wag31<sub>∆C</sub> couldn’t, strengthening the conclusion that interactions of Wag31 with other proteins are mediated by its C-terminal. However, we can’t ignore the possibility of other proteins binding to the Nterminal of Wag31. Unfortunately, due to poor expression/instability of Wag31<sub>∆C</sub> in mycobacterial shuttle vectors, we couldn’t perform a global interactome analysis of Wag31<sub>∆C</sub>.

      Minor comments:

      - Please check the legend of Fig. 1g, it appears to be labelled incorrectly.

      We have checked it. It is correct. From Fig. 1g we are trying to reflect on the percentages of cells of the three strains i.e. Msm+ATc, Δwag31-ATc, and Δwag31+ATc displaying rod, round or bulged morphology.

      - For MS/MS analysis, a GFP control is mentioned but it is not indicated how this was incorporated in the data analysis. This information should be added.

      We have incorporated that in the revised methodology.

      - The information presented in Fig. 3a, e and f could be combined in one table.

      We appreciate the idea of the reviewer but we prefer a pictorial representation of the data. It allows readers to consume the information in parts, make quicker comparisons and understand trends easily.

      - Fig. 4c Wag31K20A appears smaller in size than the wild-type protein - why is this the case? Is this not a single amino acid substitution?

      Though K20A is a single amino acid substitution, it alters the mobility of Wag31 on SDS-PAGE gel. The sequence analysis of the plasmid expressing Wag31<sub>K20A</sub> doesn’t show additional mutations other than the desired K20A. The change in mobility could be due to a change in the conformation of Wag31<sub>K20A</sub> or its ability to bind to SDS or both that modify its mobility under the influence of electric field.

      - Please clarify what is contained in the first panel of fig 4e. compared to what is in the second panel.

      The first panel represents CL-Dil-Liposomes before incubation with Wag31-GFP and the second panel shows CL-Dil-Liposomes after incubation with Wag31-GFP. The third panel shows the mixture as observed in the green channel to investigate the localisation of Wag31-GFP in the liposome-protein mix. Fourth panel shows the merged of second and third.

      - The data in Fig 6d suggests higher levels of CL in the ∆wag31 compared to wild-type - how do the authors reconcile this with the MS data in Fig. 2g showing lower CL levels?

      Fig. 6d represents the distribution of CL localisation in the tested strains of mycobacteria whereas Fig. 2g shows the absolute levels of CL in various strains. We attribute greater confidence on the lipidomics data which suggests down regulation of CL species. The NAO staining and microscopy is merely for studying localization of the CL along the cell, and cannot be used to reliably quantify or equate it to CL levels. The staining using a probe such as NAO is dependent on factors such as hydrophobicity and permeability of the cell wall, which we expect to be severely altered in a Wag31 mutant. Therefore, the increased staining of NAO seen in Wag31 mutant could just be reflective of the increased uptake of the dye rather than absolute levels of CL. The specificity of staining and localization however can be expected to be unaltered.

      Reviewer #3 (Recommendations for the authors):

      Following are suggestions for improving the writing and presentation.

      • Figure 1, the meaning of the yellow arrows present in f and h should be mentioned in the figure legend.

      We have incorporated that in the revised legend. In Fig.1f, the yellow arrowhead represents the bulged pole morphology whereas in Fig. 1h, it indicates intracellular lipid inclusions.

      • Figure 7 legend refers to panels g, h, and i. However, Figure 7 only has panels a-c. The legend lacks a description of panel c.

      We have corrected the typos and the legend.

      • Figure S1, F2-R2 and F3-R3 expected sizes should be stated in the legend of the figure.

      We have updated the legends.

      • Figure S5, is this the same figure as 5e? If so, there is no need for this figure.

      We have removed Fig. S5.

      • Methods need to be written more carefully with enough details. I listed some of the concerns below.

      Detailed methodology was previously provided in the supplementary material and now we have moved it to the materials and methods in the revised manuscript.

      • Line 392, provide more details on western blotting. What is the secondary antibody? What image documentation system was used?

      We have updated the methodology.

      • Line 400, while the methods may be the same as the reference 64, authors should still provide key details such as the way samples were fixed and processed for SEM and TEM.

      We have provided a detailed description of the same in methodology in the revised version.

      • Line 437, how do authors calculate the concentration of liposome to be 10 µM? Do they possibly mean the concentration of phospholipids used to make the liposomes?

      Yes, this is the concentration of total lipids used to make liposomes. 1 μM of Wag31 or its mutants were mixed with 100 nm extruded liposomes containing 10 μm total lipid in separate Eppendorf tubes.

      • Supplemental Line 9, "turns of" should read "turns off".

      We have edited this.

      • Supplemental Line 13, define LHS and RHS.

      LHS or left hand sequence and RHS or right hand sequence refers to the upstream and downstream flanking regions of the gene of interest.

      • Supplemental Line 20, indicate the manufacturer of the microscope and type of the objective lens.

      We have added these details now.

      • Supplemental Line 31, define MeOH, or use a chemical formula like chloroform.

      MeOH is methanol. We have provided a chemical formula in the revised version.

      • Supplemental Line 53, indicate the concentration of trypsin.

      We have included that in the revised version.

      • Supplemental Line 72, g is not a unit. "30,000 g" should be "30,000x g".

      We have revised this in the manuscript.

      • Supplemental Line 114, provide more details on western blotting. What is the manufacturer of antiFLAG antibody? What is the secondary antibody? How was the antibody binding visualized? What image documentation system was used?

      We have provided these details in the revised version.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors investigate ligand and protein-binding processes in GPCRs (including dimerization) by the multiple walker supervised molecular dynamics method. The paper is interesting and it is very well written.

      Strengths:

      The authors' method is a powerful tool to gain insight on the structural basis for the pharmacology of G protein-coupled receptors.

      We thank the Reviewer for the positive comment on the manuscript and the proposed methods.

      Reviewer #2 (Public review):

      The study by Deganutti and co-workers is a methodological report on an adaptive sampling approach, multiple walker supervised molecular dynamics (mwSuMD), which represents an improved version of the previous SuMD.

      Case-studies concern complex conformational transitions in a number of G protein Coupled Receptors (GPCRs) involving long time-scale motions such as binding-unbinding and collective motions of domains or portions. GPCRs are specialized GEFs (guanine nucleotide exchange factors) of heterotrimeric Gα proteins of the Ras GTPase superfamily. They constitute the largest superfamily of membrane proteins and are of central biomedical relevance as privileged targets of currently marketed drugs.

      MwSuMD was exploited to address:

      a) binding and unbinding of the arginine-vasopressin (AVP) cyclic peptide agonist to the V2 vasopressin receptor (V2R);

      b) molecular recognition of the β2-adrenergic receptor (β2-AR) and heterotrimeric GDPbound Gs protein;

      c) molecular recognition of the A1-adenosine receptor (A1R) and palmotoylated and geranylgeranylated membrane-anchored heterotrimeric GDP-bound Gi protein;

      d) the whole process of GDP release from membrane-anchored heterotrimeric Gs following interaction with the glucagon-like peptide 1 receptor (GLP1R), converted to the active state following interaction with the orthosteric non-peptide agonist danuglipron.

      The revised version has improved clarity and rigor compared to the original also thanks to the reduction in the number of complex case studies treated superficially.

      The mwSuMD method is solid and valuable, has wide applicability and is compatible with the most world-widely used MD engines. It may be of interest to the computational structural biology community.

      The huge amount of high-resolution data on GPCRs makes those systems suitable, although challenging, for method validation and development.

      While the approach is less energy-biased than other enhanced sampling methods, knowledge, at the atomic detail, of binding sites/interfaces and conformational states is needed to define the supervised metrics, the higher the resolution of such metrics is the more accurate the outcome is expected to be. Definition of the metrics is a user- and system-dependent process.

      We thank the Reviewer for the positive comment on the revised manuscript and mwSuMD. We agree that the choice of supervised metrics is user- and systemdependent. We aim to improve this aspect in the future with the aid of interpretable machine learning.

      Reviewer #3 (Public review):

      Summary:

      In the present work Deganutti et al. report a structural study on GPCR functional dynamics using a computational approach called supervised molecular dynamics.

      Strengths:

      The study has potential to provide novel insight into GPCR functionality. Example is the interaction between D344 and R385 identified during the Gs coupling by GLP-1R. However, validation of the findings, even computationally through for instance in silico mutagenesis study, is advisable.

      Weaknesses:

      No significant advance of the existing structural data on GPCR and GPCR/G protein coupling is provided. Most of the results are reproductions of the previously reported structures.

      The method focus of our study (mwSuMD) is an enhancement of the supervised molecular dynamics that allows supervising two metrics at the same time and uses a score, rather than a tabù-like algorithm, for handing the simulation. Further changes are the seeding of parallel short replicas (walkers) rather than a series of short simulations, and the software implementation on different MD engines (e.g. Acemd, OpenMM, NAMD, Gromacs).

      We agree with the Reviewer that experimental validation of the findings would be advisable, in line with any computational prediction. We are positive that future studies from our group employing mwSuMD will inform mutagenesis and BRET-based experiments.

      Reviewer #2 (Recommendations for the authors):

      As for GLP1R, I remain convinced that the 7LCI would have been better as a reference for all simulations than 7LCJ, also because 7LCI holds a slightly more complete ECD.

      We agree that 7LCJ would have been a better starting point than 7LCI for simulations because it presents the stalk region, contrary to 7LCJ. However, we do not think it might have influenced the output because the stalk is the most flexible segment of GLP1R, and any initial conformation is usually not retained during MD simulations.

      Please, correct everywhere the definition of the 6LN2 structure of GPL1R as a ligand-free or apo, because that structure is indeed bound to a negative allosteric modulator docked on the cytosolic end of helix-6

      We thank the reviewer for this precision. The text has been modified accordingly.

      As for the beta2-AR, the "full-length" AlphaFold model downloaded from the GPCRdb is not an intermediate active state because it is very similar to the receptor in the 3SN6 complex with Gs. Please, eliminate the inappropriate and speculative adjective "intermediate".

      We have changed “intermediate” to “not fully active”, which is less speculative since full activation can be achieved only in the presence of the G protein.

      Incidentally, in that model, the C-tail, eliminated by the authors, is completely wrong and occupies the G protein binding site. It is not clear to me the reason why the authors preferred to used an AlphaFold model as an input of simulations rather than a high resolution structural model, e.g. 4LDO. Perhaps, the reason is that all ICL regions, including ICL3, were modeled by AlphaFold even if with low confidence. I disagree with that choice.

      We understand the reviewer’s point of view. Should we have simulated an “equilibrium” receptor-ligand complex, we would have made the same choice. However, the conformational changes occurring during a G protein binding are so consistent that the starting conformation of the receptor becomes almost irrelevant as long as a sensate structure is used.  

      Reviewer #3 (Recommendations for the authors):

      The revised version of the manuscript is more concise, focusing only on two systems. However, the authors have responded superficially to the reviewers' comments, merely deleting sections of text, making minor corrections, or adding small additions to the text. In particular, the authors have not addressed the main critical points raised by both Reviewer 2 and Reviewer 3. 

      For example, the RMSD values for the binding of PF06882961 to GLP-1R remain high, raising doubts about the predictive capabilities of the method, at least for this type of system.

      What is the RMSD of the ligand relative to the experimental pose obtained in the simulations? This value must be included in the text.

      We have added this piece of information about PF06882961 RMSD in the text, which on page 6 now reads “We simulated the binding of PF06882961, reaching an RMSD to its bound conformation in 7LCJ of 3.79 +- 0.83 Å (computed on the second half of the merged trajectory, superimposing on GLP-1R Ca atoms of TMD residues 150 to 390), using multistep supervision on different system metrics (Figure 2) to model the structural hallmark of GLP-1R activation (Video S5, Video S6).”

      Similarly, the activation mechanism of GLP-1R is only partially simulated.

      Furthermore, it is not particularly meaningful to justify the high RMSD values of the SuMD simulations for the binding of Gs to GLP-1R by comparing them with those reported under unbiased MD conditions. "Replica 2, in particular, well reproduced the cryo-EM GLP-1R complex as suggested by RMSDs to 7LCI of 7.59{plus minus}1.58Å, 12.15{plus minus}2.13Å, and 13.73{plus minus}2.24Å for Gα, Gβ, and Gγ respectively. Such values are not far from the RMSDs measured in our previous simulations of GLP-1R in complex with Gs and GLP-149 (Gα = 6.18 {plus minus} 2.40 Å; Gβ = 7.22 {plus minus} 3.12 Å; Gγ = 9.30 {plus minus} 3.65 Å), which indicates overall higher flexibility of Gβ and Gγ compared to Gα, which acts as a sort of fulcrum bound to GLP-1R."

      Without delving into the accuracy of the various calculations, the authors should acknowledge that comparing protein structures with such high RMSD values has no meaningful significance in terms of convergence toward the same three-dimensional structure.

      The text has been edited to accommodate the reviewer’s suggestion and still give the readers the measure of the high flexibility of Gs bound to GLP-1R. It now reads “Such values do not support convergence with the static experimental structure but are not far from the RMSDs measured in our previous simulations of GLP-1R in complex with G<sub>s</sub> and GLP-1 (G<sub>α</sub> = 6.18 ± 2.40 Å; G<sub>b</sub> = 7.22 ± 3.12 Å; G<sub>g</sub> = 9.30 ± 3.65 Å), which indicates overall higher flexibility of G<sub>b</sub> and G<sub>g</sub> compared to G<sub>α</sub>, which acts as a sort of fulcrum bound to GLP-1R.”

      Have the authors simulated the binding of the Gs protein using the experimentally active structure of GLP-1R in complex with the ligand PF06882961 (PDB ID 7LCJ)? Such a simulation would be useful to assess the quality of the binding simulation of Gs to the GLP1R/PF06882961 complex obtained from the previous SuMD.

      We considered performing the Gs binding simulation to the active structure of GLP-1R.

      However, the GLP-1R (and other class B receptors) fully active state, as reported in 7LCJ, depends on the presence of the Gs and can be reached only upon effector coupling. Since it is unlikely that the unbound receptor is already in the fully active state, we reasoned that considering it as a starting point for Gs binding simulations would have been an artifact.

      An example of the insufficient depth of the authors' replies can be seen in their response: "We note that among the suggested references, only Mafi et al report about a simulated G protein (in a pre-formed complex) and none of the work sampled TM6 rotation without input of energy."

      This statement is inaccurate. For instance, D'Amore et al. (Chem 2024, doi: 10.1016/j.chempr.2024.08.004) simulated Gs coupling to A2A as well as TM6 rotation, as did Maria-Solano and Choi (eLife 2023, doi: 10.7554/eLife.90773.1). The former employed path collective variables metadynamics, which is not cited in the introduction or the discussion, despite its relevance to the methodologies mentioned.

      Respectfully, our previous reply is correct, as all of the mentioned articles used enhanced (energy-biased) approaches, so the claim “none of the work sampled TM6 rotation without input of energy” stands. The reference to D’Amore et al. (published after the previous round of reviews of this manuscript) has been added to the introduction; we thank the reviewer for pointing it out. 

      Additionally, SuMD employs a tabu algorithm that applies geometric supervision to the simulation, serving as an alternative approach to enhancing sampling compared to the "input of energy" techniques as called by the authors. A fair discussion should clearly acknowledge this aspect of the SuMD methodology.

      We have now specified in the Methods that a tabù-like algorithm is part of SuMD, which, despite being the parent technique of mwSuMD, is not the focus of the present work. We provide extended references for readers interested in SuMD. mwSuMD, on the other hand, does not use a tabù-like algorithm but rather a continuative approach based on a score to select the best walker for each batch, as described in the Methods.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper contains what could be described as a "classic" approach towards evaluating a novel taste stimuli in an animal model, including standard behavioral tests (some with nerve transections), taste nerve physiology, and immunocytochemistry of taste cells of the tongue. The stimulus being tested is ornithine, from a class of stimuli called "kokumi" (in terms of human taste); these kokumi stimuli appear to enhance other canonical tastes, increasing what are essentially hedonic attributes of other stimuli. The mechanism for ornithine detection is thought to be GPRC6A receptors expressed in taste cells. The authors showed evidence for this in an earlier paper with mice; this paper evaluates ornithine taste in a rat model, and comes to a similar conclusion, albeit with some small differences between the two rodent species.

      Strengths:

      The data show effects of ornithine on taste/intake in laboratory rats: In two-bottle and briefer intake tests, adding ornithine results in higher intake of most, but all not all stimuli tested. Bilateral chorda tympani (CT) nerve cuts or the addition of GPRC6A antagonists decreased or eliminated these effects. Ornithine also evoked responses by itself in the CT nerve, but mainly at higher concentrations; at lower concentrations it potentiated the response to monosodium glutamate. Finally, immunocytochemistry of taste cell expression indicated that GPRC6A was expressed predominantly in the anterior tongue, and co-localized (to a small extent) with only IP3R3, indicative of expression in a subset of type II taste receptor cells.

      Weaknesses:

      As the authors are aware, it is difficult to assess a complex human taste with complex attributes, such as kokumi, in an animal model. In these experiments they attempt to uncover mechanistic insights about how ornithine potentiates other stimuli by using a variety of established experimental approaches in rats. They partially succeed by finding evidence that GPRC6A may mediate effects of ornithine when it is used at lower concentrations. In the revision they have scaled back their interpretations accordingly. A supplementary experiment measuring certain aspects of the effects of ornithine added to Miso soup in human subjects is included for the express purpose of establishing that the kokumi sensation of a complex solution is enhanced by ornithine; however, they do not use any such complex solutions in the rat studies. Moreover, the sample size of the human experiment is (still) small - it really doesn't belong in the same manuscript with the rat studies.

      Despite the reviewer’s suggestion, we would like to include the human sensory experiment. Our rationale is that we must first demonstrate that the kokumi of miso soup is enhanced by the addition of ornithine, which is then followed by basic animal experiments to investigate the underlying mechanisms of kokumi in humans.

      We did not present the additive effects of ornithine on miso soup in the present rat study because our previous companion paper (Fig. 1B in Mizuta et al., 2021, Ref. #26) already confirmed that miso soup supplemented with 3 mM L-ornithine (but not D-ornithine) was statistically significantly (P < 0.001) preferred to plain miso soup by mice.

      Furthermore, we believe that our sample size (n = 22) is comparable to those employed in other studies. For example, the representative kokumi studies by Ohsu et al. (Ref. #9), Ueda et al. (Ref. #10), Shibata et al. (Ref. #20), Dunkel et al. (Ref. #37), and Yang et al. (Ref. #44) used sample sizes of 20, 19, 17, 9, and 15, respectively.

      Reviewer #2 (Public review):

      Summary:

      The authors used rats to determine the receptor for a food-related perception (kokumi) that has been characterized in humans. They employ a combination of behavioral, electrophysiological, and immunohistochemical results to support their conclusion that ornithine-mediated kokumi effects are mediated by the GPRC6A receptor. They complemented the rat data with some human psychophysical data. I find the results intriguing, but believe that the authors overinterpret their data.

      Strengths:

      The authors provide compelling evidence that ornithine enhances the palatability of several chemical stimuli (i.e., IMP, MSG, MPG, Intralipos, sucrose, NaCl, quinine). Ornithine also increases CT nerve responses to MSG. Additionally, the authors provide evidence that the effects of ornithine are mediated by GPRC6A, a G-protein-coupled receptor family C group 6 subtype A, and that this receptor is expressed primarily in fungiform taste buds. Taken together, these results indicate that ornithine enhances the palatability of multiple taste stimuli in rats and that the enhancement is mediated, at least in part, within fungiform taste buds. This is an important finding that could stand on its own. The question of whether ornithine produces these effects by eliciting kokumi-like perceptions (see below) should be presented as speculation in the Discussion section.

      Weaknesses:

      I am still unconvinced that the measurements in rats reflect the "kokumi" taste percept described in humans. The authors conducted long-term preference tests, 10-min avidity tests and whole chorda tympani (CT) nerve recordings. None of these procedures specifically model features of "kokumi" perception in humans, which (according to the authors) include increasing "intensity of whole complex tastes (rich flavor with complex tastes), mouthfulness (spread of taste and flavor throughout the oral cavity), and persistence of taste (lingering flavor)." While it may be possible to develop behavioral assays in rats (or mice) that effectively model kokumi taste perception in humans, the authors have not made any effort to do so. As a result, I do not think that the rat data provide support for the main conclusion of the study--that "ornithine is a kokumi substance and GPRC6A is a novel kokumi receptor."

      Kokumi can be assessed in humans, as demonstrated by the enhanced kokumi perception observed when miso soup is supplemented with ornithine (Fig. S1). Currently, we do not have a method to measure the same kokumi perception in animals. However, in the two-bottle preference test, our previous companion paper (Fig. 1B in Mizuta et al. 2021, Ref. #26) confirmed that miso soup supplemented with 3 mM L-ornithine (but not D-ornithine) was statistically significantly (P < 0.001) preferred over plain miso soup by mice.

      Of the three attributes of kokumi perception in humans, the “intensity of whole complex tastes (rich flavor with complex tastes)” was partly demonstrated in the present rat study. In contrast, “mouthfulness (the spread of taste and flavor throughout the oral cavity)” could not be directly detected in animals and had to be inferred in the Discussion. “Persistence of taste (lingering flavor)” was evident at least in the chorda tympani responses; however, because the tongue was rinsed 30 seconds after the onset of stimulation, the duration of the response was not fully recorded.

      It is well accepted in sensory physiology that the stronger the stimulus, the larger the tonic response—and consequently, the longer it takes for the response to return to baseline. For example, Kawasaki et al. (2016, Ref. #45) clearly showed that the duration of sensation increased proportionally with the concentration of MSG, lactic acid, and NaCl in human sensory tests. The essence of this explanation has been incorporated into the Discussion (p. 12).

      Why are the authors hypothesizing that the primary impacts of ornithine are on the peripheral taste system? While the CT recordings provide support for peripheral taste enhancement, they do not rule out the possibility of additional central enhancement. Indeed, based on the definition of human kokumi described above, it is likely that the effects of kokumi stimuli in humans are mediated at least in part by the central flavor system.

      We agree with the reviewer’s comment. Our CT recordings indicate that the effects of kokumi stimuli on taste enhancement occur primarily at the peripheral taste organs. The resulting sensory signals are then transmitted to the brain, where they are processed by the central gustatory and flavor systems, ultimately giving rise to kokumi attributes. This central involvement in kokumi perception is discussed on page 12. Although kokumi substances exert their effects at low concentrations—levels at which the substance itself (e.g., ornithine) does not become more favorable or (in the case of γ-Glu-Val-Gly) exhibits no distinct taste—we cannot rule out the possibility that even faint taste signals from these substances are transmitted to the brain and interact with other taste modalities.

      The authors include (in the supplemental data section) a pilot study that examined the impact of ornithine on variety of subjective measures of flavor perception in humans. The presence of this pilot study within the larger rat study does not really mice sense. While I agree with the authors that there is value in conducting parallel tests in both humans and rodents, I think that this can only be done effectively when the measurements in both species are the same. For this reason, I recommend that the human data be published in a separate article.

      Despite the reviewer’s suggestion, we intend to include the human sensory experiment. Our rationale is that we must first demonstrate that the kokumi of miso soup is enhanced by the addition of ornithine, and then follow up with basic animal experiments to investigate the potential underlying mechanisms of kokumi in humans.

      In our previous companion paper (Fig. 1B in Mizuta et al., 2021, Ref. #26), we confirmed with statistical significance (P < 0.001) that mice preferred miso soup supplemented with 3 mM L-ornithine (but not D-ornithine) over plain miso soup. However, as explained in our response to Reviewer #2’s first concern (in the Public review), it is difficult to measure two of the three kokumi attributes—aside from the “intensity of whole complex tastes (rich flavor with complex tastes)”—in animal models.

      The authors indicated on several occasions (e.g., see Abstract) that ornithine produced "synergistic" effects on the CT nerve response to chemical stimuli. "Synergy" is used to describe a situation where two stimuli produce an effect that is greater than the sum of the response to each stimulus alone (i.e., 2 + 2 = 5). As far as I can tell, the CT recordings in Fig. 3 do not reflect a synergism.

      We appreciate your comments regarding the definition of synergy. In Fig. 5 (not Fig. 3), please note the difference in the scaling of the ordinate between Fig. 5D (ornithine responses) and Fig. 5E (MSG responses). When both responses are presented on the same scale, it becomes evident that the response to 1 mM ornithine is negligibly small compared to the MSG response, which clearly indicates that the response to the mixture of MSG and 1 mM ornithine exceeds the sum of the individual responses to MSG and 1 mM ornithine. Therefore, we have described the effect as “synergistic” rather than “additive.” The same observation applies to the mice experiments in our previous companion paper (Fig. 8 in Mizuta et al. 2021, Ref. #26), where synergistic effects are similarly demonstrated by graphical representation. We have also added the following sentence to the legend of Fig. 5:

      “Note the different scaling of the ordinate in (D) and (E).”

      Reviewer #3 (Public review):

      Summary:

      In this study the authors set out to investigate whether GPRC6A mediates kokumi taste initiated by the amino acid L-ornithine. They used Wistar rats, a standard laboratory strain, as the primary model and also performed an informative taste test in humans, in which miso soup was supplemented with various concentrations of L-ornithine. The findings are valuable and overall the evidence is solid. L-Ornithine should be considered to be a useful test substance in future studies of kokumi taste and the class C G protein coupled receptor known as GPRC6A (C6A) along with its homolog, the calcium-sensing receptor (CaSR) should be considered candidate mediators of kokumi taste. The researchers confirmed in rats their previous work on Ornithine and C6A in mice (Mizuta et al Nutrients 2021).

      Strengths:

      The overall experimental design is solid based on two bottle preference tests in rats. After determining the optimal concentration for L-Ornithine (1 mM) in the presence of MSG, it was added to various tastants including: inosine 5'-monophosphate; monosodium glutamate (MSG); mono-potassium glutamate (MPG); intralipos (a soybean oil emulsion); sucrose; sodium chloride (NaCl; salt); citric acid (sour) and quinine hydrochloride (bitter). Robust effects of ornithine were observed in the cases of IMP, MSG, MPG and sucrose; and little or no effects were observed in the cases of sodium chloride, citric acid; quinine HCl. The researchers then focused on the preference for Ornithine-containing MSG solutions. Inclusion of the C6A inhibitors Calindol (0.3 mM but not 0.06 mM) or the gallate derivative EGCG (0.1 mM but not 0.03 mM) eliminated the preference for solutions that contained Ornithine in addition to MSG. The researchers next performed transections of the chord tympani nerves (with sham operation controls) in anesthetized rats to identify a role of the chorda tympani branches of the facial nerves (cranial nerve VII) in the preference for Ornithine-containing MSG solutions. This finding implicates the anterior half-two thirds of the tongue in ornithine-induced kokumi taste. They then used electrical recordings from intact chorda tympani nerves in anesthetized rats to demonstrate that ornithine enhanced MSG-induced responses following the application of tastants to the anterior surface of the tongue. They went on to show that this enhanced response was insensitive to amiloride, selected to inhibit 'salt tastant' responses mediated by the epithelial Na+ channel, but eliminated by Calindol. Finally they performed immunohistochemistry on sections of rat tongue demonstrating C6A positive spindle-shaped cells in fungiform papillae that partially overlapped in its distribution with the IP3 type-3 receptor, used as a marker of Type-II cells, but not with (i) gustducin, the G protein partner of Tas1 receptors (T1Rs), used as a marker of a subset of type-II cells; or (ii) 5-HT (serotonin) and Synaptosome-associated protein 25 kDa (SNAP-25) used as markers of Type-III cells.

      At least two other receptors in addition to C6A might mediate taste responses to ornithine: (i) the CaSR, which binds and responds to multiple L-amino acids (Conigrave et al, PNAS 2000), and which has been previously reported to mediate kokumi taste (Ohsu et al., JBC 2010) as well as responses to Ornithine (Shin et al., Cell Signaling 2020); and (ii) T1R1/T1R3 heterodimers which also respond to L-amino acids and exhibit enhanced responses to IMP (Nelson et al., Nature 2001). These alternatives are appropriately discussed and, taken together, the experimental results favor the authors' interpretation that C6A mediates the Ornithine responses. The authors provide preliminary data in Suppl. 3 for the possibility of co-expression of C6A with the CaSR.

      Weaknesses:

      The authors point out that animal models pose some difficulties of interpretation in studies of taste and raise the possibility in the Discussion that umami substances may enhance the taste response to ornithine (Line 271, Page 9).

      Ornithine and umami substances interact to produce synergistic effects in both directions—ornithine enhances responses to umami substances, and vice versa. These effects may depend on the concentrations used, as described in the Discussion (pp. 9–10). Further studies are required to clarify the precise nature of this interaction.

      One issue that is not addressed, and could be usefully addressed in the Discussion, relates to the potential effects of kokumi substances on the threshold concentrations of key tastants such as glutamate. Thus, an extension of taste distribution to additional areas of the mouth (previously referred to as 'mouthfulness') and persistence of taste/flavor responses (previously referred to as 'continuity') could arise from a reduction in the threshold concentrations of umami and other substances that evoke taste responses.

      Thank you for this important suggestion. If ornithine reduces the threshold concentrations of tastants—including glutamate—and enhances their suprathreshold responses, then adding ornithine may activate additional taste cells. This effect could explain kokumi attributes such as an “extension of taste distribution” and possibly the “persistence of responses.” As shown in Fig. 2, the lowest concentrations used for each taste stimulus are near or below the thresholds, which indicates that threshold concentrations are reduced—especially for MSG and MPG. We have incorporated this possibility into the Discussion as follows (p.12):

      “Kokumi substances may reduce the threshold concentrations as well as they increase the suprathreshold responses of tastants. Once the threshold concentrations are lowered, additional taste cells in the oral cavity become activated, and this information is transmitted to the brain. As a result, the brain perceives this input as coming from a wider area of the mouth.”

      The status of one of the compounds used as an inhibitor of C6A, the gallate derivative EGCG, as a potential inhibitor of the CaSR or T1R1/T1R3 is unknown. It would have been helpful to show that a specific inhibitor of the CaSR failed to block the ornithine response.

      Thank you for this important comment. We attempted to identify a specific inhibitor of CaSR. Although we considered using NPS-2143—a commonly used CaSR inhibitor—it is known to also inhibit GPRC6A. We agree that using a specific CaSR inhibitor would be beneficial and plan to pursue this in future studies.

      It would have been helpful to include a positive control kokumi substance in the two bottle preference experiment (e.g., one of the known gamma glutamyl peptides such as gamma-glu-Val-Gly or glutathione), to compare the relative potencies of the control kokumi compound and Ornithine, and to compare the sensitivities of the two responses to C6A and CaSR inhibitors.

      We agree with this comment. In retrospect, it may have been advantageous to directly compare the potencies of CaSR and GPRC6A agonists in enhancing taste preferences—and to evaluate the sensitivity of these preferences to CaSR and GPRC6A antagonists. However, we did not include γ-Glu-Val-Gly in the present study because we have already reported its supplementation effects on the ingestion of basic taste solutions in rats using the same methodology in a separate paper (Yamamoto and Mizuta, 2022, Ref. #25). The results from both studies are compared in the Discussion (p. 11).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major:

      I am not convinced by the Author's arguments for including the human data. I appreciate their efforts in adding a few (5) subjects and improving the description, but it still feels like it is shoehorned into this paper, and would be better published as a different manuscript.

      This human study is short, but it is complete rather than preliminary. The rationale for us to include the human data as supplementary information is shown in responses to the reviewer’s Public review.

      Minor concerns:

      Page 3 paragraph 1: Suggest "contributing to palatability".

      Thank you for this suggestion. We have rewritten the text as follows:

      “…, the brain further processes these sensations to evoke emotional responses, contributing to palatability or unpleasantness”.

      Page 4 paragraph 2: The text still assumes that "kokumi" is a meaningful descriptor for what rodents experience. Re-wording the following sentence like this could help:

      "Neuroscientific studies in mice and rats provide evidence that gluthione and y-Glu-Val-Gly activate CaSRs, and modify behavioral responses to other tastants in a way that may correspond to kokumi taste as experienced by humans. However, to our..."

      Or something similar.

      Thank you for this suggestion. We have rewritten the sentence according to your suggestion as follows:

      "Neuroscientific studies (23,25,30) in mice and rats provide evidence that glutathione and y-Glu-Val-Gly activate CaSRs, and modify behavioral responses to other tastants in a way that may correspond to kokumi as experienced by humans”.

      Page 7 paragraph 1 - put the concentrations of Calindol and EGCG used (in the physiology exps) in the text.

      We have added the concentrations: “300 µM calindol and 100 µM EGCG”.

      Reviewer #2 (Recommendations for the authors):

      I have included all of my recommendations in the public review section.

      Reviewer #3 (Recommendations for the authors):

      Although the definitions of 'thickness', 'mouthfulness' and 'continuity' have been revised very helpfully in the Introduction, 'mouthfulness' reappears at other points in the MS e.g., Page 4, Results, Line 3; Page 9, Line 3. It is best replaced by the new definition in these other locations too.

      We wish to clarify that our revised text stated, “…to clarify that kokumi attributes are inherently gustatory, in the present study we use the terms ‘intensity of whole complex tastes (rich flavor with complex tastes)’ instead of ‘thickness,’ ‘mouthfulness (spread of taste and flavor throughout the oral cavity)’ instead of ‘continuity,’ and ‘persistence of taste (lingering flavor)’ instead of ‘continuity.’” The term “mouthfulness” was retained in our text, though we provided a more specific explanation. In the re-revised version, we have added “(spread of taste in the oral cavity)” immediately after “mouthfulness.”

      I doubt that many scientific readers will be familliar with the term 'intragemmal nerve fibres' (Page 8, Line 4). It is used appropriately but it would be helpful to briefly define/explain it.

      We have added an explanation as follows:

      “… intragemmal nerve fibers, which are nerve processes that extend directly into the structure of the taste bud to transmit taste signals from taste cells to the brain.”

      I previously pointed out the overlap between the CaSR's amino acid (AA) and gamma-glutamyl-peptide binding site. I was surprised by the authors' response which appeared to miss the point being made. It was based on the impacts of selected mutations in the receptor's Venus FlyTrap domain (Broadhead JBC 2011) on the responses to AAs and glutathione analogs. The significantly more active analog, S-methylglutathione is of additional interest because, like glutathione itself, it is present in mammalian body fluids. My apologies to the authors for not more carefully explaining this point.

      Thank you for this comment. Both CaSR and GPRC6A are recognized as broad-spectrum amino acid sensors; however, their agonist profiles differ. Aromatic amino acids preferentially activate CaSR, whereas basic amino acids tend to activate GPRC6A. For instance, among basic amino acids, ornithine is a potent and specific activator of GPRC6A, while γ-Glu-Val-Gly in addition to amino acids is a high-potency activator of CaSR. It remains unclear how effectively ornithine activates CaSR and whether γ-glutamyl peptides also activate GPRC6A. These questions should be addressed in future studies.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This valuable study uses consensus-independent component analysis to highlight transcriptional components (TC) in high-grade serous ovarian cancers (HGSOC). The study presents a convincing preliminary finding by identifying a TC linked to synaptic signaling that is associated with shorter overall survival in HGSOC patients, highlighting the potential role of neuronal interactions in the tumour microenvironment. This finding is corroborated by comparing spatially resolved transcriptomics in a small-scale study; a weakness is in being descriptive, non-mechanistic, and requiring experimental validation.”

      We sincerely thank the editors for their valuable and constructive feedback. We are grateful for the recognition of our findings and the importance of identifying transcriptional components in high-grade serous ovarian cancers.

      We acknowledge the editors’ observation regarding the descriptive nature of our study and its limited mechanistic depth. We agree that additional experimental validation would further strengthen our conclusions. We are planning and executing the experiments for a future study to provide mechanistic insights into the associations found in this study. In addition, recent reviews focused on the emerging field of cancer neuroscience emphasize the early stages the field is in, specifically in terms of a mechanistic understanding of the contributions of tumor-infiltrating nerves in tumor initiation and progression (Amit et al., 2024; Hwang et al., 2024). Nonetheless, we wish to emphasize that emerging mechanistic preclinical studies have demonstrated the influence of tumour-infiltrating nerves on disease progression (Allen et al., 2018; Balood et al., 2022; Darragh et al., 2024; Globig et al., 2023; Jin et al., 2022; Restaino et al., 2023; Zahalka et al., 2017). Several of these studies include contributions from our co-authors and feature in vitro and in vivo research on head and neck squamous cell carcinoma as well as high-grade serous ovarian carcinoma samples. This study further strengthens the preclinical work by showing in patient data, the potential relevance of neuronal signaling on disease outcome.

      For instance, Restiano et al. (2023) demonstrated that substance P, released from tumour-infiltrating nociceptors, potentiates MAP kinase signaling in cancer cells, thereby driving disease progression. Crucially, this effect was shown to be reversible in vivo by blocking the substance P receptor (Restaino et al., 2023). These findings offer compelling evidence of the role of tumour innervation in cancer biology.

      Our current study in tumor samples of patients with high-grade serous ovarian cancer identifies a transcriptional component that is enriched for genes for which the protein is located in the synapse. We believe that the previously published mechanistic insights support our findings and suggest that this transcriptional component could serve as a valuable screening tool to identify innervated tumours based on bulk transcriptomes. Clinically, this information is highly relevant, as patients with innervated tumours may benefit from alternate therapeutic strategies targeting these innervations.

      Reviewer #1 (Public review)

      This manuscript explores the transcriptional landscape of high-grade serous ovarian cancer (HGSOC) using consensus-independent component analysis (c-ICA) to identify transcriptional components (TCs) associated with patient outcomes. The study analyzes 678 HGSOC transcriptomes, supplemented with 447 transcriptomes from other ovarian cancer types and noncancerous tissues. By identifying 374 TCs, the authors aim to uncover subtle transcriptional patterns that could serve as novel drug targets. Notably, a transcriptional component linked to synaptic signaling was associated with shorter overall survival (OS) in patients, suggesting a potential role for neuronal interactions in the tumour microenvironment. Given notable weaknesses like lack of validation cohort or validation using another platform (other than the 11 samples with ST), the data is considered highly descriptive and preliminary.

      Strengths:

      (1) Innovative Methodology:

      The use of c-ICA to dissect bulk transcriptomes into independent components is a novel approach that allows for the identification of subtle transcriptional patterns that may be overshadowed in traditional analyses.

      We thank the reviewer for recognizing the strengths and novelty of our study. We appreciate the positive feedback on using consensus-independent component analysis (c-ICA) to decompose bulk transcriptomes, which allowed us to detect subtle transcriptional signals often overlooked in traditional analyses.

      (2) Comprehensive Data Integration:

      The study integrates a large dataset from multiple public repositories, enhancing the robustness of the findings. The inclusion of spatially resolved transcriptomes adds a valuable dimension to the analysis.

      We thank the reviewer for recognizing the robustness of our study through comprehensive data integration. We appreciate the acknowledgment of our efforts to leverage a large, multi-source dataset, as well as the additional insights gained from spatially resolved transcriptomes. We consider this integrative approach enhances the depth of our analysis and contributes to a more nuanced understanding of the tumour microenvironment.

      (3) Clinical Relevance:

      The identification of a synaptic signaling-related TC associated with poor prognosis highlights a potential new avenue for therapeutic intervention, emphasizing the role of the tumour microenvironment in cancer progression.

      We appreciate the recognition of the clinical implications of our findings. The identification of a synaptic signaling-related transcriptional component associated with poor prognosis underscores the potential for novel therapeutic targets within the tumour microenvironment. We agree that this insight could open new avenues for intervention and further highlights the role of neuronal interactions in cancer progression.

      Weaknesses:

      (1) Mechanistic Insights:

      While the study identifies TCs associated with survival, it provides limited mechanistic insights into how these components influence cancer progression. Further experimental validation is necessary to elucidate the underlying biological processes.

      We acknowledge the point regarding the limited mechanistic insights provided in our study. We agree that further experimental validation would significantly enhance our understanding of how the biological processes captured by these transcriptional components influence cancer progression. We are planning and executing the experiments for  a future study to provide mechanistic insights into the associations found in this study.

      Our analyses were performed on publicly available bulk and spatial resolved expression profiles. To investigate the mechanistic insights in future studies, we plan to integrate spatial transcriptomic data with immunohistochemical analysis of the same tumour samples to validate our findings. Additionally, we have initiated efforts to set up in vitro co-cultures of neurons and ovarian cancer cells. These co-cultures will enable us to investigate how synaptic signaling impacts ovarian cancer cell behavior.

      (2) Generalizability:

      The findings are primarily based on transcriptomic data from HGSOC. It remains unclear how these results apply to other subtypes of ovarian cancer or different cancer types.

      To respond to this remark, we utilized survival data from Bolton et al. (2022) and TCGA to investigate associations between TC activity scores and overall survival of patients with ovarian clear cell carcinoma, the second most common subtype of epithelial ovarian cancer, and  other cancer types respectively. However, we acknowledge the limitations of TCGA survival data, as highlighted in the referenced article (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8726696/). Additionally, as shown in Figure 5, we provided evidence of TC121 activity across various cancer types, suggesting broader relevance. For the results of the analyses mentioned above, please refer to our response to remark 1.3 of the recommendation section (page 4).

      (3) Innovative Methodology:

      Requires more validation using different platforms (IHC) to validate the performance of this bulk-derived data. Also, the lack of control over data quality is a concern.

      We acknowledge the value of validating our results with alternative platforms such as IHC. We are planning and executing the experiments for a future study to provide mechanistic insights into the associations found in this study.

      We implemented regarding data quality control, the following measures to ensure the reliability of our analysis:

      Bulk Transcriptional Profiles: To assess data quality, we conducted principal component analysis (PCA) on the sample Pearson product-moment correlation matrix. The first principal component (PCqc), which explains approximately 80-90% of the variance, was used to distinguish technical variability from biological signals (Bhattacharya et al., 2020). Samples with a correlation coefficient below 0.8 relative to PCqc were identified as outliers and excluded. Additionally, MD5 hash values were generated for each CEL file to identify and remove duplicate samples. Expression values were standardized to a mean of zero and a variance of one for each gene to minimize probeset- or gene-specific variability across datasets (GEO, CCLE, GDSC, and TCGA).

      Spatial Transcriptional Profiles: PCA was also applied to spatial transcriptomic data for quality control. Only samples with consistent loading factor signs for the first principal component across all individual spot profiles were retained. Samples failing this criterion were excluded from further analyses.

      (4) Clinical Application:

      Although the study suggests potential drug targets, the translation of these findings into clinical practice is not addressed. Probably given the lack of some QA/QC procedures it'll be hard to translate these results. Future studies should focus on validating these targets in clinical settings.”

      Regarding clinical applications, we acknowledge the importance of further exploring strategies targeting synaptic signaling and neurotransmitter release in the tumour microenvironment (TME). As partially discussed in the first version of the manuscript, drugs such as ifenprodil and lamotrigine—commonly used to treat neuronal disorders—can block glutamate release, thereby inhibiting subsequent synaptic signaling. Additionally, the vesicular monoamine transporter (VMAT) inhibitor reserpine blocks the formation of synaptic vesicles (Reid et al., 2013; Williams et al., 2001). Previous in vitro studies with HGSOC cell lines demonstrated that ifenprodil significantly reduced cancer cell proliferation, while reserpine triggered apoptosis in cancer cells (North et al., 2015; Ramamoorthy et al., 2019). The findings highlight the potential of such approaches to disrupt synaptic neurotransmission in the TME.

      To address potential translation of our findings into clinical practice more comprehensively, we have included additional details in the manuscript:

      Section discussion, page 16, lines 338-341:

      “This interaction can be targeted with pan-TRK inhibitors such as entrectinib and larotrectinib. Both drugs are showing promising results in multiple phase II trials, including ovarian cancer and breast cancer patients. Furthermore, a TRKB-specific inhibitor was developed (ANA-12), but has not been subjected to any clinical trials in cancer so far (Ardini et al., 2016; Burris et al., 2015; Drilon et al., 2018, 2017).”

      On page 17, lines 361-374:

      “Strategies to disrupt neuronal signaling and neurotransmitter release in neurons target key elements of excitatory neurotransmission, such as calcium flux and vesicle formation. Drugs like ifenprodil and lamotrigine, commonly used to treat neuronal disorders, block glutamate release and subsequent neuronal signaling. Additionally, the vesicular monoamine transporter (VMAT) inhibitor reserpine prevents synaptic vesicle formation (Reid et al., 2013; Williams, 2001). In vitro studies with HGSOC cell lines have demonstrated that ifenprodil significantly inhibits tumour proliferation, while reserpine induces apoptosis in cancer cells (North et al., 2015; Ramamoorthy et al., 2019). These approaches hold promise for inhibiting neuronal signaling and interactions in the TME.”

      Reviewer #2 (Public review):

      Summary:

      Consensus-independent component analysis and closely related methods have previously been used to reveal components of transcriptomic data that are not captured by principal component or gene-gene coexpression analyses.

      Here, the authors asked whether applying consensus-independent component analysis (c-ICA) to published high-grade serous ovarian cancer (HGSOC) microarray-based transcriptomes would reveal subtle transcriptional patterns that are not captured by existing molecular omics classifications of HGSOC.

      Statistical associations of these (hitherto masked) transcriptional components with prognostic outcomes in HGSOC could lead to additional insights into underlying mechanisms and, coupled with corroborating evidence from spatial transcriptomics, are proposed for further investigation.

      This approach is complementary to existing transcriptomics classifications of HGSOC.

      The authors have previously applied the same approach in colorectal carcinoma (Knapen et al. (2024) Commun. Med).

      Strengths:

      (1) Overall, this study describes a solid data-driven description of c-ICA-derived transcriptional components that the authors identified in HGSOC microarray transcriptomics data, supported by detailed methods and supplementary documentation.

      We thank the reviewer for acknowledging the strength of our data-driven approach and the use of consensus-independent component analysis (c-ICA) to identify transcriptional components within HGSOC microarray data. We aimed to provide comprehensive methodological detail and supplementary documentation to support the reproducibility and robustness of our findings. We believe this approach allows for the identification of subtle transcriptional signals that might have been overlooked by traditional analysis methods.

      (2) The biological interpretation of transcriptional components is convincing based on (data-driven) permutation analysis and a suite of analyses of association with copy-number, gene sets, and prognostic outcomes.

      We appreciate the positive feedback on the biological interpretation of our transcriptional components. We are pleased that our approach, which includes data-driven permutation testing and analyses of associations with copy-number alterations, gene sets, and prognostic outcomes, was found to be convincing. These analyses were integral to enhancing our findings’ robustness and biological relevance.

      (3) The resulting annotated transcriptional components have been made available in a searchable online format.

      Thank you for this important positive remark.

      (4) For the highlighted transcriptional component which has been annotated as related to synaptic signalling, the detection of the transcriptional component among 11 published spatial transcriptomics samples from ovarian cancers appears to support this preliminary finding and requires further mechanistic follow-up.

      Thank you for acknowledging the accessibility of our annotated transcriptional components. We prioritized making these data available in a searchable online format to facilitate further research and enable the community to explore and validate our findings.

      Weaknesses:

      (1) This study has not explicitly compared the c-ICA transcriptional components to the existing reported transcriptional landscape and classifications for ovarian cancers (e.g. Smith et al Nat Comms 2023; TCGA Nature 2011; Engqvist et al Sci Rep 2020) which would enable a further assessment of the additional contribution of c-ICA - whether the cICA approach captured entirely complementary components, or whether some components are correlated with the existing reported ovarian transcriptomic classifications.

      We acknowledge the reviewer’s insightful suggestion to compare our c-ICA-derived transcriptional components with previously reported ovarian cancer classifications, such as those from Smith et al. (2023), TCGA (2011), and Engqvist et al. (2020). To address this, we incorporated analyses comparing the activity scores of our transcriptional components with these published landscapes and classifications, particularly focusing on any associations with overall survival. Additionally, we evaluated correlations between gene signatures from a subset of these studies and our identified TCs, enhancing our understanding of the unique contributions of the c-ICA approach. Please refer to our response to remark 10 for the results of these analyses.

      (2) Here, the authors primarily interpret the c-ICA transcriptional components as a deconvolution of bulk transcriptomics due to the presence of cells from tumour cells and the tumour microenvironment.

      However, c-ICA is not explicitly a deconvolution method with respect to cell types: the transcriptional components do not necessarily correspond to distinct cell types, and may reflect differential dysregulation within a cell type. This application of c-ICA for the purpose of data-driven deconvolution of cell populations is distinct from other deconvolution methods that explicitly use a prior cell signature matrix.”

      We acknowledge that c-ICA, unlike traditional deconvolution methods, is not specifically designed for cell-type deconvolution and does not rely on a predefined cell signature matrix. While we explored the transcriptional components in the context of tumour and microenvironmental interactions, we agree that these components may not correspond directly to distinct cell types but rather reflect complex patterns of dysregulation, potentially within individual cell populations.

      Our goal with c-ICA was to uncover hidden transcriptional patterns possibly influenced by cellular heterogeneity. However, we recognize these patterns may also arise from regulatory processes within a single cell type. To investigate further, we used single-cell transcriptional data (~60,000 cell-types annotated profiles from GSE158722) and projected our transcriptional components onto these profiles to obtain activity scores, allowing us to assess each TC’s behavior across diverse cellular contexts after removing the first principal component to minimize background effects. Please refer to our response to remark 2.2 in the recommendations to the authors (page 14) for the results of this analysis.

      References

      Allen JK, Armaiz-Pena GN, Nagaraja AS, Sadaoui NC, Ortiz T, Dood R, Ozcan M, Herder DM, Haemerrle M, Gharpure KM, Rupaimoole R, Previs R, Wu SY, Pradeep S, Xu X, Han HD, Zand B, Dalton HJ, Taylor M, Hu W, Bottsford-Miller J, Moreno-Smith M, Kang Y, Mangala LS, Rodriguez-Aguayo C, Sehgal V, Spaeth EL, Ram PT, Wong ST, Marini FC, Lopez-Berestein G, Cole SW, Lutgendorf SK, diBiasi M, Sood AK. 2018. Sustained adrenergic signaling promotes intratumoral innervation through BDNF induction. Cancer Res 78 (12):3233-3242.

      Ardini E, Menichincheri M, Banfi P, Bosotti R, Ponti CD, Pulci R, Ballinari D, Ciomei M, Texido G, Degrassi A, Avanzi N, Amboldi N, Saccardo MB, Casero D, Orsini P, Bandiera T, Mologni L, Anderson D, Wei G, Harris J, Vernier J-M, Li G, Felder E, Donati D, Isacchi A, Pesenti E, Magnaghi P, Galvani A. 2016. Entrectinib, a Pan–TRK, ROS1, and ALK Inhibitor with activity in multiple molecularly defined cancer Indications. Mol Cancer Ther 15:628–639.

      Balood M, Ahmadi M, Eichwald T, Ahmadi A, Majdoubi A, Roversi Karine, Roversi Katiane, Lucido CT, Restaino AC, Huang S, Ji L, Huang K-C, Semerena E, Thomas SC, Trevino AE, Merrison H, Parrin A, Doyle B, Vermeer DW, Spanos WC, Williamson CS, Seehus CR, Foster SL, Dai H, Shu CJ, Rangachari M, Thibodeau J, Rincon SVD, Drapkin R, Rafei M, Ghasemlou N, Vermeer PD, Woolf CJ, Talbot S. 2022. Nociceptor neurons affect cancer immunosurveillance. Nature 611:405–412.

      Bhattacharya A, Bense RD, Urzúa-Traslaviña CG, Vries EGE de, Vugt MATM van, Fehrmann RSN. 2020. Transcriptional effects of copy number alterations in a large set of human cancers. Nat Commun 11:715.

      Burris HA, Shaw AT, Bauer TM, Farago AF, Doebele RC, Smith S, Nanda N, Cruickshank S, Low JA, Brose MS. 2015. Abstract 4529: Pharmacokinetics (PK) of LOXO-101 during the first-in-human Phase I study in patients with advanced solid tumors: Interim update. Cancer Res 75:4529–4529.

    1. Author response:

      We thank the reviewers for their evaluation, for helpful suggestions to improve clarity and accuracy, and for their positive reception of the manuscript. We will incorporate their suggestions in a revised manuscript. Here, we respond to their major comments. 

      The reviewers suggest that a molecular study of Hofstenia’s reproductive systems would be beneficial, as would mechanistic explanations for its unusual reproductive behavior. We agree with the reviewers that both of these would be interesting avenues, although we think this is outside the scope of this current manuscript. This manuscript studies growth and reproductive dynamics in acoels, and establishes a foundation to study its underlying molecular, developmental, and physiological machinery. 

      Our previous molecular work, using scRNAseq and FISH, identified several germline markers. Here, we show that two of them are specific markers of testes and ovaries, respectively. This, together, with our new anatomical data, allows us to identify the expression domains of most of these other markers more clearly. Some markers may be expressed in a presumptive common germline that eventually splits into an anterior male germline and posterior female germline. We agree with the reviewers that understanding the dynamics of germline differentiation and its molecular genetic underpinnings would be very interesting, and we hope to address this in future work. 

      As the reviewers note, we do not understand how sperm is stored, how the worm’s own sperm can travel to its ovaries to enable selfing, or how eggs in the ovaries travel within the body. We agree with the reviewers that understanding these processes would be very interesting. Our histological and molecular work so far has been unable to find tube-like structures or other cavities for storage and transport. Potentially, cells could move within the parenchyma. Explaining these events will require substantial effort (including mechanistic studies of cell behavior and ultrastructural studies that the reviewers suggest), and we hope to do this in future work. 

      We agree with Reviewer 1 that it is interesting that Piwi-1 expression is only observed in the ovaries and not in the testes - unusual given its broad germline expression in many taxa. Although there are several possible explanations for this finding (for eg. Piwi-1 could be expressed at low levels in male germline, perhaps other Piwi proteins are expressed in male germline, or Piwi may play roles in male germline progenitors that are not co-located with maturing sperm, etc), we do not currently know why this is so, and we will discuss these possibilities in our revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors report the role of a novel gene Aff3ir-ORF2 in flow-induced atherosclerosis. They show that the gene is anti-inflammatory in nature. It inhibits the IRF5-mediated athero-progression by inhibiting the causal factor (IRF5). Furthermore, the authors show a significant connection between shear stress and Aff3ir-ORF2 and its connection to IRF5 mediated athero-progression in different established mice models which further validates the ex vivo findings.

      Strengths:

      (1) An adequate number of replicates were used for this study.

      (2) Both in vitro and in vivo validation was done.

      (3) The figures are well presented.

      (4) In vivo causality is checked with cleverly designed experiments.

      We thank you for your positive remarks.

      Weaknesses:

      (1) Inflammatory proteins must be measured with standard methods e.g ELISA as mRNA level and protein level does not always correlate.

      Thanks. We have followed your advice and performed ELISA experiments to measure the concentrations of inflammatory cytokines, including IL-6 and IL-1β. The newly acquired results have been included in Figure 2E (Line 160-163) in the revised manuscript.

      (2) RNA seq analysis has to be done very carefully. How does the euclidean distance correlate with the differential expression of genes. Do they represent the neighborhood?

      If they do how does this correlation affect the conclusion of the paper?

      We thank the reviewer for this professional comments and apologize for the confusion. The heatmap using Euclidean distance was generated based on the expression levels of all differentially expressed genes (calculated with deseq2). Since its interpretation overlaps with the volcano plot presented in Figure 4B, we have moved the heatmap to Figure S5A in the revised manuscript and provided a detailed description in the figure legend (Lines 106-108 in the supporting information). Additionally, to better illustrate the variation among all samples, we have performed PCA analysis and included the new results in Figure 4A of the revised manuscript.

      (3) The volcano plot does not indicate the q value of the shown genes. It is advisable to calculate the q value for each of the genes which represents the FDR probability of the identified genes.

      Thank you for your careful review. We apologize for the incorrect labeling.

      It was P.adj value. The label for Figure 4B has been corrected in the revised manuscript. 

      (4) GO enrichment was done against the Global gene set or a local geneset? The authors should provide more detailed information about the analysis.

      Thank you. We performed GO enrichment analysis against the global gene set. The description of the results has been updated in the revised manuscript (Lines 222–224).

      (5) If the analysis was performed against a global gene set. How does that connect with this specific atherosclerotic microenvironment?

      Thank you for your insightful comments. We have followed your advice and investigated the functional characteristics of these differentially expressed genes in the context of the atherosclerotic microenvironment. The RNA-seq differential gene list was further mapped onto the atherosclerosis-related gene dataset (PMID: 27374120), resulting in 363 overlapping genes. The 363 genes were subjected to bioinformatics enrichment analysis using Gene Ontology (GO) databases. GO analysis of these genes revealed enrichment in processes related to cell−cell adhesion and leukocyte activation involved in immune response (Figure S5B), which is highly consistent with the observed effects of AFF3ir-ORF2 on VCAM-1 expression. The newly acquired data are presented in Figure S5B and the description of the results is included in the revised manuscript (Line 227-233).

      (6) What was the basal expression of genes and how did the DGE (differential gene expression) values differ?

      Thanks for the comments. The RNA-sequencing data has been submitted to GEO datasets (GSE286206), making the basal gene expression data available to readers.

      The differential expression analysis was performed using DESeq2 (v1.4.5) (PMID: 25516281) with a criterion of 1.5-fold change and P<0.05. We has included the description in the revised manuscript in Lines 220-222 and Lines 575-576.

      (7) How was IRF5 picked from GO analysis? was it within the 20 most significant genes?

      Sorry for the confusion. IRF5 was not identified through GO analysis. To determine the upstream transcriptional regulators, we used the ChEA3 database to predict potential upstream transcription factors based on all differentially expressed genes. The top 20 transcription factors were selected based on their scores. To further explore their relationship with atherosclerosis, these top 20 transcription factors were mapped to the atherosclerosis-related gene list in the DisGeNET database. IRF5 and IRF8 were the only two overlapping genes. To clarify this process, we have included a more detailed description of the IRF prediction approach in the revised manuscript (Lines 234–239).

      (8) Microscopic studies should be done more carefully? There seems to be a global expression present on the vascular wall for Aff3ir-ORF2 and the expression seems to be similar to AFF3 in Figure 1.

      We thank the reviewer for the valuable suggestion. We have followed your advice and provided the more representative images in Figure 1F.

      Reviewer #2 (Public review):

      Summary:

      The authors recently uncovered a novel nested gene, Aff3ir, and this work sets out to study its function in endothelial cells further. Based on differences in expression correlating with areas of altered shear stress, they investigate a role for the isoform Aff3ir-ORF2 in endothelial activation and development of atherosclerosis downstream of disturbed shear stress. Using a knockout mouse model and in vivo overexpression experiments, they demonstrate a strong potential for Aff3ir-ORF2 to alleviate atherosclerosis. They find that Aff3ir-ORF2 interacts with the pro-inflammatory transcription factor IRF5 and retains it in the cytoplasm, hence preventing upregulation of inflammation-associated genes. The data expands our knowledge of IRF5 regulation which could be relevant to researchers studying various inflammatory diseases as well as adding to our understanding of atherosclerosis development.

      Strengths:

      The in vivo data is solid using immunofluorescence staining to assess AFF3ir-ORF2 expression, a knockout mouse model, overexpression and knockdown studies, and rescue experiments in combination with two atherosclerotic models to demonstrate that Aff3ir-ORF2 can lessen atherosclerotic plaque formation in ApoE<sup>-/-</sup> mice.

      We thank you for your positive remarks.

      Weaknesses:

      While the in vivo data is generally convincing, a few data panels have issues and will need addressing. Also, the knockout mouse model will need to be described, since the paper referred to in the manuscript does not actually report any knockout mouse model. Hence it is unclear how Aff3ir-ORF2 is targeted, but Figure S2B shows that targeting is partial, since about 30% expression remains at the RNA level in MEFs isolated from the knockout mice.

      We thank you for the valuable comments. 

      First, we have followed your advice and included detailed information regarding the animal construction in the revised manuscript in Line 405-415. Additionally, the genotyping results have been included in new Figure S3A.

      Second, we acknowledge your concern about the knockout efficiency of ORF2 in mice. While the PCR assay indicated approximately 30% residual expression, our Western blot analysis of aorta samples demonstrated that ORF2 protein was barely detectable in knockout mice, as shown in new Figure S3B-C. Besides, our in vivo experiments using MEF from WT and AFF3ir-ORF2<sup>-/-</sup> mice (Figure 4I) further confirmed successful knockout. 

      Third, we have included a discussion addressing the discrepancies between PCR and Western blot results. In addition to technical differences between the two methods, the nature of AFF3ir-ORF2 may also contribute to these inconsistencies. The parent gene AFF3 is located in a genetically variable region and can be excised via intron 5 to form a replicable transposon, which translocates to other chromosomes and has been linked to leukemia (PMID: 34995897, 12203795, 12743608, and 17968322). AFF3ir is located in the intron 6, thus it exists in the transposon, which may complicate the measurement of its expression. Replicable transposons can exist as extrachromosomal elements, allowing them to be inherited across generations. We have included these discussion in the revised manuscript in Line 188-196.

      While the effect on atherosclerosis is clear, the conclusion that this is the result of reduced endothelial cell activation is not supported by the data. The mouse model is described as a global knockout and the shRNA knockdowns (Figure 5) and overexpression data in Figure 2 are not cell type-specific. Only the overexpression construct in Figure 6 uses an ICAM-2 promoter construct, which drives expression in endothelial cells, though leaky expression of this promoter has been reported in the literature. Therefore, other cell types such as smooth muscle cells or macrophages could be responsible for the effects observed.

      Thank you for your critical comment. To address your concern, we have made the following three revisions:

      First, we have analyzed the expression of AFF3ir-ORF2 in the vascular wall with or without intima in WT and AFF3ir-ORF2 knockout mice. As shown in Figure 1B and Figure S1A, while the expression of AFF3ir-ORF2 was notably downregulated in the aortic intima of athero-prone regions compared to the protective region, it remained largely unchanged in the aortic wall without intima across different regions of the aorta. This suggested that AFF3ir-ORF2 might play a predominant role in endothelial cells rather than other cell types in the context of shear stress.

      Second, we have used human endothelial cells (HUVECs) to further confirm our findings. As shown in Figure 2C and Figure S2B, we found that AFF3ir-ORF2 overexpression could attenuate disturbed shear stress-induced IRF5 nuclear translocation and the expression of inflammatory genes in HUVECs, suggesting the potential anti-inflammatory effects of AFF3ir-ORF2 in endothelial cells.

      Third, we agree with the reviewer’s comment that we cannot completely exclude the potential involvement of other cell types. Hence, we have included a limitation statement in the discussion part in Lines 341-344.

      The weakest part of the manuscript is the in vitro experiment using some nonidentifiable expression differences. The data is used to hypothesise on a role for IRF5 in the effects observed with Aff3ir-ORF2 knockout.

      Thank you for the comments. To address your concerns, we have made the following two changes:

      First, we have further investigated the functional features of the differential genes from the RNA-seq in the context of atherosclerotic microenvironment. The differential gene list was mapped onto the atherosclerosis-related gene dataset (PMID: 27374120), and a total of 363 genes overlapped. These 363 genes were subjected to bioinformatics enrichment analysis using Gene Ontology (GO) databases. GO analysis showed that these genes were mainly enriched in cell−cell adhesion and leukocyte activation involved in immune response, which aligns with the expression of VCAM-1 affected by AFF3ir-ORF2. The newly acquired data are presented in Figure S5B and the description of the results has been updated in the revised manuscript (Line 227-233).

      Second, we have further verified the RNA-seq results in vitro. Several classical inflammatory factors, including ICAM-1, CCL5, and CXCL10, which mRNA levels were significantly downregulated in RNA-seq and were also identified as target genes of IRF5, were analyzed. We found that AFF3ir-ORF2 deficiency aggravated, while AFF3ir-ORF2 overexpression attenuated, the expression of ICAM-1, CCL5, and CXCL10 induced by disturbed shear stress (New Figure S5D). Besides, the regulation of ICAM-1 by AFF3ir-ORF2 was confirmed at both protein and mRNA levels in HUVECs (Figure 2C-D and Figure S2B). 

      Overall, the paper succeeds in demonstrating a link between Aff3ir-ORF2 and atherosclerosis, but the cell types involved and mechanisms remain unclear. The study also shows a functional interaction between Aff3ir-ORF2 and IRF5 in embryonic fibroblasts, but any relevance of this mechanism for atherosclerosis or any cell types involved in the development of this disease remains largely speculative.

      Thank you for all the valuable comments. The specific responses have been provided above. Briefly, we have followed your advice and further confirmed the regulation of AFF3ir-ORF2 on IRF5 in endothelial cells. Besides, the RNA-seq results have been further analyzed, and partial results have been verified in endothelial cells to support the anti-inflammatory role of AFF3ir-ORF2. We greatly appreciate the reviewer’s insightful comments, which guided our revisions and contributed to significantly improving the paper.

      Reviewer #3 (Public review):

      This study is to demonstrate the role of Aff3ir-ORF2 in the atheroprone flow-induced EC dysfunction and ensuing atherosclerosis in mouse models. Overall, the data quality and comprehensiveness are convincing. In silico, in vitro, and in vivo experiments and several atherosclerosis were well executed. To strengthen further, the authors can address human EC relevance.

      We thank you for your positive remarks and insightful comments.

      Major comments:

      (1) The tissue source in Figures 1A and 1B should be clarified, the whole aortic segments or intima? If aortic segment was used, the authors should repeat the experiments using intima, due to the focus of the current study on the endothelium.

      We thank you for the suggestion. The tissue used in Figures 1A and 1B was from aortic intima. The description has been updated for clarity in the revised manuscript on Lines 114-125. 

      (2) Why were MEFs used exclusively in the in vitro experiments? Can the authors repeat some of the critical experiments in mouse or human ECs?

      Thank you for this insightful comment. Isolation and culture of mouse primary aortic ECs were notorious technically difficult and shear stress experiment require a large number of cells. Considering MEFs exhibit responses consistent with those of ECs, which has been delicately proved (PMID: 23754392), we used MEFs in our in vitro experiments.

      However, following your valuable advice, we have now employed human ECs (HUVECs) to confirm our findings. Consistent with our results in MEFs, we found that AFF3ir-ORF2 overexpression reduced the expression of inflammatory genes induced by disturbed shear stress at both protein and mRNA levels in HUVECs (Figure 2C, Figure S2B). Notably, despite the significant anti-inflammatory effects of AFF3irORF2, the sequence of this gene is not conserved in Homo sapiens and lacks an initiation codon, which is why we did not further proceed with the loss-of-function experiments.

      (3) The authors should explain why AFF3ir-ORF2 overexpression did not affect the basal level expression of ICAM-1, VCAM-1, IL-1b, and IL-6 under ST conditions (Figure 2A-C).

      We thank you for raising this critical question. Indeed, we found that AFF3ir-ORF2 overexpression did not affect the basal level of inflammatory genes under ST conditions, while it exerted anti-inflammatory effects under OSS conditions. One underlying reason might be the relative low level of expression of inflammatory genes under ST compared to OSS conditions. Additionally, as our findings suggested, AFF3ir-ORF2 exerted its anti-inflammatory role by binding to IRF5 and inhibiting IRF5 nuclear translocation. However, as shown in Figure 4I, IRF5 might be predominantly localized in the cytoplasm rather than the nucleus under ST conditions.

      We have included the description in the revised manuscript on Lines 157-163.

      (4) Please include data from sham controls, i.e., right carotid artery in Figure 2E.

      Thank you for the suggestion. We have followed your advice and included sham controls (staining of the right carotid arteries) in Figure S2E.

      (5) Given that the merit of the study lies in the effect of different flow patterns, the legion areas in AA and TA (Figure 3B, 3C) should be separately compared.

      We have followed your valuable suggestion and included the additional statistical results in Figure 3C in the revised manuscript.

      (6) For confirmatory purposes for the variations of IRF5 and IRF8, can the authors mine available RNA-seq or even scRNA-seq data on human or mouse atherosclerosis? This approach is important and could complement the current results that are lacking EC data.

      Thank you for your valuable suggestion. In the present study, we found that disturbed flow did not alter the protein level of IRF5 but promoted its nuclear translocation. Following your advice, we analyzed the expression of IRF5 in human ECs (GSE276195) and atherosclerotic mouse arteries (GSE222583) using public databases. Consistently, IRF5 did not show significant changes in mRNA levels under these conditions (Figure S5E-F), suggesting that the regulation of IRF5 in the context of disturbed flow or atherosclerosis is primarily post-translational.

      (7) With the efficacy of using AAV-ICAM2-AFF3ir-ORF2 in atherosclerosis reduction (Figure 6), the authors are encouraged to use lung ECs isolated from the AFF3ir-ORF2/-mice to recapitulate its regulation of IRF5.

      We greatly appreciate your valuable suggestion to use lung ECs from mice. We have observed that AFF3ir-ORF2 deficiency enhanced the nuclear translocation of IRF5 induced by OSS. Noteworthy, the transcriptional levels of IRF5 were minimally affected by AFF3ir-ORF2 deficiency. Hence, to recapitulate the regulation of IRF5 with lung ECs isolated from the AFF3ir-ORF2<sup>-/-</sup> mice, it would require treating lung ECs with OSS followed by isolation of subcellular components. However, both in vitro shear stress treatment and subcellular fraction isolation require a large number of cells, and mouse lung ECs are difficult to culture and pass through several passages. Therefore, we hope the reviewer understands that these experiments were not performed. As an alternative, we have confirmed the transcriptional activity changes of IRF5 due to AFF3ir-ORF2 manipulation by analyzing the expression of its target genes indicated from RNA-seq results in both the intima of mouse aorta (Figure S5C-D) and HUVECs (Figure 2C-D and Figure S2B). Our findings show that AFF3ir-ORF2 deficiency increases, while its overexpression decreases, the expression levels of IRF5-targeted genes in endothelial cells.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Figure 2H - As I understand it, this is MFI measurement of VCAM. Please change accordingly.

      Thanks. Corrected.

      Reviewer #2 (Recommendations for the authors):

      My major concern is the use of MEFs for all in vitro experiments. All experiments should be done in endothelial cells if the aim is to show a mechanism relevant to endothelial activation and atherosclerosis. Lines 314-316 of the conclusion are absolutely not supported by the data.

      Thank you for the insightful comment. Following your advice, we have employed human ECs (HUVECs) to confirm our findings. Consistent with the findings in MEFs, we found that AFF3ir-ORF2 decreased the expression of inflammatory genes induced by disturbed shear stress, both at protein and mRNA levels in HUVECs (Figure 2C, Figure S2B). 

      Since the in vivo experiments are not cell type-specific, it would be important to test and compare the expression of Aff3ir-ORF2 in endothelial cells as well as smooth muscle and macrophages to support any claim of cell type involvement in the effects observed.

      We thank you for the valuable suggestion. In the revised manuscript, we have followed your suggestion and analyzed the expression pattern of AFF3ir-ORF2 in different regions of the aorta with or without endothelium. We observed a marked reduction in AFF3ir-ORF2 expression in the intima of the aortic arch compared to that in the intima of the thoracic aorta (Figure 1B-C). In contrast, the expression of AFF3irORF2 in the media and adventitia was comparable between the aortic arch and thoracic aorta (Figure S1A-B). These findings provide further evidence supporting the predominant role of endothelial cells. The description has been modified accordingly in the revised manuscript on Lines 121-134.

      The results of the RNA-seq experiment should be disclosed. The experiment should be deposited on GEO or similar and a table of differentially expressed genes added to the manuscript.

      Thank you for the suggestion. We have followed your advice and submitted the RNA-sequencing data to GEO datasets (GSE286206). Besides, a table of differentially expressed genes has been included in the revised manuscript as Table S3.

      Minor comments:

      (1) Figure 1A. Missing the labels of the target.

      Thanks. Corrected. 

      (2) Figure 1D. Cell alignment in AA compared to TA suggests that the image is of the outer curvature, but Figure 1F is showing that the outer curvature is expressing more ORF2 than the inner. Why was the outer curvature chosen for this panel and is it true to conclude on that assumption that expression of ORF2 compares as TA > Outer > Inner curvature?

      We thank you for the insightful suggestion. We have followed your advice and performed en-face immunofluorescence staining of AFF3ir-ORF2 and quantification of AFF3ir-ORF2 expression in AA inner, AA outer, and TA regions. As shown in new Figure 1D-E, the results indeed indicated that expression of AFF3irORF2 compares as TA > AA outer > AA inner.

      (3) Figure 2H. Target mislabelled as ICAM-1 instead of VCAM-.

      Thanks. Corrected. 

      (4) Figure S1A. VE-cad staining and cell shape differ between control and overexpression. Is this a phenotype or are different areas of the vasculature shown, which would make it hard to interpret since Aff3ir-ORF2 levels differ in different vessel areas?

      We thank the reviewer for raising this important question. For Figure S1A, only common carotid arteries were used for the staining. The potential differences in cell shape observed might be due to variations in the procedure during immunofluorescence staining. To avoid any misinterpretation, more representative images have been provided in the revised Figure S2C.

      (5) Figure 3D-G. Images are not representative of the quantification results.

      Thank you. More representative images have been replaced in the revised Figure 3D and Figure 3F.

      (6) Line 220. Data for IRF8 are not shown in the figure to support this claim.

      Thank you for pointing this out. The expression level of IRF8 has been included in Figure S5C.

      (7) Figure 6F. AAV-AFF3ir-ORF2 panel order inverted.

      Thanks. Corrected. 

      (8) Line 401. Type "hat" instead of "h at".

      Sorry for the typo. Corrected.

      Reviewer #3 (Recommendations for the authors):

      Minor comments:

      (1)  The rationale for the following sentence (lines 126-128) is lacking: "Moreover, 126 we observed the expression of AFF3ir-ORF2 in longitudinal sections of the mouse aorta (B. 127 Li et al., 2019)".

      Thanks. The rationale for these experiments have been included in the revised manuscript on Line 127-129. 

      (2) The source of antibodies against AFF3ir-ORF1 and AFF3ir-ORF2 used in western blot and immunostaining experiments were not mentioned in the manuscript.

      Thanks. The antibody information has been included in the method part on Line 456-457, 510-511. 

      (3) The rationale and data interpretation is not clear for the following sentence (lines 220-221): "In addition, neither IRF5 nor IRF8 expression was regulated by AFF3irORF2 220 (Figure 4F)".

      Thank you for pointing this out. The expression level of IRF8 has been included in Figure S5C. The sentence has been modified accordingly on Lines 253254. 

      (4) The quality of AFF3ir-ORF2 blot in Figure 4I needs improvement.

      Thanks. More representative images have been included in Figure 4I.

      (5) It appears that AFF3ir-ORF2 was present in both cytoplasm and nucleus. Does AFF3ir-ORF2 have a nuclear entry peptide? Also, the nuclear entry of AFF3ir-ORF2 can be enhanced by an immunofluorescence staining experiment.

      Thank you for your insightful comments. Indeed, although we did not observe any significant subcellular changes in the localization of AFF3ir-ORF2 under shear stress conditions, our immunostaining results revealed that AFF3ir-ORF2 is localized in both the cytoplasm and nucleus. To explore whether AFF3ir-ORF2 contains nuclear localization signals, we utilized the NLStradamus tool (http://www.moseslab.csb.utoronto.ca/NLStradamus/) to analyze its sequence. The predication indicated that AFF3ir-ORF2 lacks a nuclear localization signal.

    1. Author response:

      Reviewer 1: “The authors over-emphasized this study's relevance to RP disease (i.e. patients and mammals are not capable of regeneration like zebrafish).”

      It is true that humans and other mammals are not capable of regeneration.  This is why we and many other groups study zebrafish to identify mechanisms of regeneration that successfully form new rods.  That said, our previous paper on the molecular basis or retinal remodeling in this zebrafish model system (Santhanam et al., 2023; Cell Mol Life Sci. 2023;80(12):362) revealed remarkable similarities in the stress and physiological responses of rods, cones, RPE and inner retinal neurons to those in mammalian RP models.  Thus, we believe this zebrafish is an adequate model of RP and an excellent model to study rod regeneration. 

      Reviewer 1: “They under-explained this regeneration's relevance or difference to normal developmental process, which is pretty much conserved in evolution.”  and:

      Reviewer 3: “It would also benefit from integration with single-cell multiome data from developing retinas (Lyu, et al. 2023).”

      It is an excellent suggestion to compare the regenerative response we have studied in a chronic degeneration/regeneration model to the trajectory of developmental rod formation. In Lyu, et at. 2023, it was found that while retinal regeneration has similarities to retinal development, it does not precisely recapitulate the same transcription factors and processes. Any differences between this trajectory and that revealed in developmental studies would be enlightening.  We intend to do such analyses to add to a revised manuscript in the future. 

      Reviewer 2: “Perhaps the authors can consider explaining why the Prdm1a knock-down cells would have a higher Retp1 signal per cell in Fig 9B. Is this a representative picture? This appears to contradict Figure 8's conclusion, although I could tell that the number of Retp1+ cells in the ONL appears to be lower.”

      These are different experimental paradigms.  Figure 8 shows knockdown 48 hours after injection, at which time prdm1a knockdown is affecting rhodopsin expression directly.  That experiment investigated whether prdm1a knockdown affected progenitor proliferation.  Figure 9 shows a time point 6 days after injection, at which time we were asking if prdm1a knockdown affected differentiation of progenitors into rods. 

      Reviewer 2: “The authors noted "Surprisingly, the knockdown of prdm1a resulted in a significantly higher number of rhodopsin-positive cells in the INL (p=0.0293)", while it appears in Figure 9B, 9C that the difference is 2 cells vs 0 in a rightly broader field. It seems to be too strong of a statement for this effect.”

      This was a very unexpected finding.  We included statistics (Figure 9D) to support the finding, so we don’t think it is too strong a statement to make.  Speculation as to what might cause this is fascinating.  Are Muller cells producing progenitors that fail to migrate to the ONL before differentiating into rods?  The lack of BrdU labeling does not support this idea.  Do neurogenic progenitor cells in the INL differentiate towards rods via a pathway that does not require prdm1a?  Perhaps.  Perhaps there are other explanations.

      Reviewer 2: “It appears to this reviewer that the proteomic data didn't reveal much in line with the overall hypothesis or the mechanism, and it's unclear why the authors went for proteomics rather than bulk RNA-seq or ChIP-seq for a transcription factor knock-down experiment. Overall this is a minor point.”

      We agree that bulk RNA sequencing would provide a similar answer, possibly with greater sensitivity.  We chose proteomics for two reasons: 1) We wanted an independent assessment of the knockdown effects that could evaluate whether the knockdowns worked and what pathways were affected.  Since our pathway comparison is to single cell RNAseq data, bulk RNA seq did not seem to be fully independent. 2) Because we used translation-blocking antisense oligos for most knockdown experiments, we did not expect the transcript abundance of the targeted gene to be affected, although these oligos can lead to target transcript degradation.  Thus, we were not likely to be able to validate that our knockdown worked with this technique. 

      Reviewer 3: “The gene regulatory network analysis here would also benefit from the addition of matched scATAC-Seq data, …”

      This is certainly true, and the reviewer points to several studies that have made excellent use of this strategy.  Given the 1-2 year timeline to obtain and analyze such data, it is unlikely that we will be able to incorporate such data in our revised manuscript, but we hope to do so for follow-up studies.

      Reviewer 3: “The description of the time points analyzed is vague, stating only that "fish from 6 to 12 months of age were analyzed". Since photoreceptor degeneration is progressive, it is unclear how progenitor behavior changes over time, or how the gene expression profile of other cell types such as microglia, cones, or surviving rods is altered by disease progression.”

      We have shown in a previous study (Santhanam et al. Cells. 2020;9(10)) that rod degeneration and regeneration are in a steady state from at least 4 to 8 months of age, and in other experiments in the lab at least to 12 months of age.  In this age range, regeneration keeps up with the pace of degeneration, both of which are very fast.  This encompasses the cell types that we specifically study in this manuscript.  The reviewer is right that other cell types could undergo changes.  This is a separate topic of study in the lab.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The objective of this research is to understand how the expression of key selector transcription factors, Tal1, Gata2, Gata3, involved in GABAergic vs glutamatergic neuron fate from a single anterior hindbrain progenitor domain is transcriptionally controlled. With suitable scRNAseq, scATAC-seq, CUT&TAG, and footprinting datasets, the authors use an extensive set of computational approaches to identify putative regulatory elements and upstream transcription factors that may control selector TF expression. This data-rich study will be a valuable resource for future hypothesis testing, through perturbation approaches, of the many putative regulators identified in the study. The data are displayed in some of the main and supplemental figures in a way that makes it difficult to appreciate and understand the authors' presentation and interpretation of the data in the Results narrative. Primary images used for studying the timing and coexpression of putative upstream regulators, Insm1, E2f1, Ebf1, and Tead2 with Tal1 are difficult to interpret and do not convincingly support the authors' conclusions. There appears to be little overlap in the fluorescent labeling, and it is not clear whether the signals are located in the cell soma nucleus.

      Strengths:

      The main strength is that it is a data-rich compilation of putative upstream regulators of selector TFs that control GABAergic vs glutamatergic neuron fates in the brainstem. This resource now enables future perturbation-based hypothesis testing of the gene regulatory networks that help to build brain circuitry.

      We thank Reviewer #1 for the thoughtful assessment and recognition of the extensive datasets and computational approaches employed in our study. We appreciate the acknowledgment that our efforts in compiling data-rich resources for identifying putative regulators of key selector transcription factors (TFs)—Tal1, Gata2, and Gata3—are valuable for future hypothesis-driven research.

      Weaknesses:

      Some of the findings could be better displayed and discussed.

      We acknowledge the concerns raised regarding the clarity and interpretability of certain figures, particularly those related to expression analyses of candidate upstream regulators such as Insm1, E2f1, Ebf1, and Tead2 in relation to Tal1. We agree that clearer visualization and improved annotation of fluorescence signals are crucial to accurately support our conclusions. In our revised manuscript, we will enhance image clarity and clearly indicate sites of co-expression for Tal1 and its putative regulators, ensuring the results are more readily interpretable. Additionally, we will expand explanatory narratives within the figure legends to better align the figures with the results section.

      Reviewer #2 (Public review):

      Summary:

      In the manuscript, the authors seek to discover putative gene regulatory interactions underlying the lineage bifurcation process of neural progenitor cells in the embryonic mouse anterior brainstem into GABAergic and glutamatergic neuronal subtypes. The authors analyze single-cell RNA-seq and single-cell ATAC-seq datasets derived from the ventral rhombomere 1 of embryonic mouse brainstems to annotate cell types and make predictions or where TFs bind upstream and downstream of the effector TFs using computational methods. They add data on the genomic distributions of some of the key transcription factors and layer these onto the single-cell data to get a sense of the transcriptional dynamics.

      Strengths:

      The authors use a well-defined fate decision point from brainstem progenitors that can make two very different kinds of neurons. They already know the key TFs for selecting the neuronal type from genetic studies, so they focus their gene regulatory analysis squarely on the mechanisms that are immediately upstream and downstream of these key factors. The authors use a combination of single-cell and bulk sequencing data, prediction and validation, and computation.

      We also appreciate the thoughtful comments from Reviewer #2, highlighting the strengths of our approach in elucidating gene regulatory interactions that govern neuronal fate decisions in the embryonic mouse brainstem. We are pleased that our focus on a critical cell-fate decision point and the integration of diverse data modalities, combined with computational analyses, has been recognized as a key strength.

      Weaknesses:

      The study generates a lot of data about transcription factor binding sites, both predicted and validated, but the data are substantially descriptive. It remains challenging to understand how the integration of all these different TFs works together to switch terminal programs on and off.

      Reviewer #2 correctly points out that while our study provides extensive data on predicted and validated transcription factor binding sites, clearly illustrating how these factors collectively interact to regulate terminal neuronal differentiation programs remains challenging. We acknowledge the inherently descriptive nature of the current interpretation of our combined datasets.

      In our revision, we will clarify how the different data types support and corroborate one another, highlighting what we consider the most reliable observations of TF activity. Additionally, we will revise the discussion to address the challenges associated with interpreting the highly complex networks of interactions within the gene regulatory landscape.

      We sincerely thank both reviewers for their constructive feedback, which we believe will significantly enhance the quality and accessibility of our manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors demonstrate impairments induced by a high cholesterol diet on GLP-1R dependent glucoregulation in vivo as well as an improvement after reduction in cholesterol synthesis with simvastatin in pancreatic islets. They also map sites of cholesterol high occupancy and residence time on active versus inactive GLP-1Rs using coarse-grained molecular dynamics (cgMD) simulations and screened for key residues selected from these sites and performed detailed analyses of the effects of mutating one of these residues, Val229, to alanine on GLP-1R interactions with cholesterol, plasma membrane behaviour, clustering, trafficking and signalling in pancreatic beta cells and primary islets, and describe an improved insulin secretion profile for the V229A mutant receptor.

      These are extensive and very impressive studies indeed. I am impressed with the tireless effort exerted to understand the details of molecular mechanisms involved in the effects of cholesterol for GLP-1 activation of its receptor. In general, the study is convincing, the manuscript well written and the data well presented.

      Some of the changes are small and insignificant which makes one wonder how important the observations are. For instance, in figure 2 E (which is difficult to interpret anyway because the data are presented in percent, conveniently hiding the absolute results) does not show a significant result of the cyclodextrin except for insignificant increases in basal secretion. That is not identical to impairment of GLP-1 receptor signaling!

      We assume that the reviewer refers to Figure 1E, where we show the percentage of insulin secretion in response to 11 mM glucose +/- exendin-4 stimulation in mouse islets pretreated with vehicle or MβCD loaded with 20 mM cholesterol. While we concur with the reviewer that the effect in this case is triggered by increased basal insulin secretion at 11 mM glucose, exendin-4 appears to no longer compensate for this increase by proportionally amplifying insulin responses in cholesterol-loaded islets, leading to a significantly decreased exendin-4induced insulin secretion fold increase under these circumstances, as shown in Figure 1F. We interpret these results as a defect in the GLP-1R capacity to amplify insulin secretion beyond the basal level to the same extent as in vehicle conditions. An alternative explanation is that there is a maximum level of insulin secretion in our cells, and 11 mM glucose + exendin-4 stimulation gets close to that value. With the increasing effect of cholesterol-loaded MβCD on basal secretion at 11 mM glucose, exendin-4 stimulation would then appear to work less well.

      We have performed a simple experiment to investigate this possibility: insulin secretion following stimulation with a secretagogue cocktail (20 mM glucose, 30 mM KCl, 10 µM FSK and 100 µM IBMX) in islets +/- MβCD/cholesterol loading to determine if maximal stimulation had been reached or not in our original experiment. This experiment, now included in Supplementary Figure 1C, demonstrates that insulin secretion can increase up to ~4% (from ~2%) in our islets, supporting our initial conclusion. We have also included absolute insulin concentrations as well as percentages of secretion for all the experiments included in the study in the new Supplementary File 1 to improve the completeness of the report.

      To me the most important experiment of them all is the simvastatin experiment, but the results rest on very few numbers and there is a large variation. Apparently, in a previous study using more extensive reduction in cholesterol the opposite response was detected casting doubt on the significance of the current observation. I agree with the authors that the use of cyclodextrin may have been associated with other changes in plasma membrane structure than cholesterol depletion at the GLP-1 receptor.

      We agree with the reviewer that the insulin secretion results in vehicle versus LPDS/simvastatin treated mouse islets (Figure 1H, I) are relatively variable. We have therefore performed 2 extra biological repeats of this experiment (for a total n of 7). Results now show a significant increase in exendin-4-stimulated secretion with no change in basal secretion in islets pre-incubated with LPDS/simvastatin.  

      The entire discussion regarding the importance of cholesterol would benefit tremendously from studies of GLP-1 induced insulin secretion in people with different cholesterol levels before and after treatment with cholesterol-lowering agents. I suspect that such a study would not reveal major differences.

      We agree with the reviewer that such study would be highly relevant. While this falls outside the scope of the present paper, we encourage other researchers with access to clinical data on GLP-1R agonist responses in individuals taking cholesterol lowering agents to share their results with the scientific community. We have highlighted this point in the paper discussion to emphasise the importance of more research in this area.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript the authors provided a proof of concept that they can identify and mutate a cholesterol-binding site of a high-interest class B receptor, the GLP-1R, and functionally characterize the impact of this mutation on receptor behavior in the membrane and downstream signaling with the intent that similar methods can be useful to optimize small molecules that as ligands or allosteric modulators of GLP-1R can improve the therapeutic tools targeting this signaling system.

      Strengths:

      The majority of results on receptor behavior are elucidated in INS-1 cells expressing the wt or mutant GLP-1R, with one experiment translating the findings to primary mouse beta-cells. I think this paper lays a very strong foundation to characterize this mutation and does a good job discussing how complex cholesterol-receptor interactions can be (ie lower cholesterol binding to V229A GLP-1R, yet increased segregation to lipid rafts). Table 1 and Figure 9 are very beneficial to summarize the findings. The lower interaction with cholesterol and lower membrane diffusion in V229A GLP-1R resembles the reduced diffusion of wt GLP-1R with simv-induced cholesterol reductions, although by presumably decreasing the cholesterol available to interact with wt GLP-1R. This could be interesting to see if lowering cholesterol alters other behaviors of wt GLP-1R that look similar to V229A GLP-1R. I further wonder if the authors expect that increased cholesterol content of islets (with loading of MβCD saturated with cholesterol or high-cholesterol diets) would elevate baseline GLP-1R membrane diffusion, and if a more broad relationship can be drawn between GLP-1R membrane movement and downstream signaling.

      Membrane diffusion experiments are difficult to perform in intact islets as our method requires cell monolayers for RICS analysis. We however agree that it is of interest to investigate if cholesterol loading affects GLP-1R diffusion. To this end, we have performed further RICS analysis in INS-1 832/3 SNAP/FLAG-hGLP-1R cells pretreated with vehicle or MβCD loaded with 20 mM cholesterol (new Supplementary Figures 1D and 1E). Interestingly, results show significantly increased plasma membrane diffusion of exendin-4-stimulated receptors, with no change in basal diffusion, following MβCD/cholesterol loading. This behaviour differs from that of the V229A mutant receptor which shows reduced diffusion under basal conditions, a pattern that mimics that of the WT receptor under low cholesterol conditions (by pre-treatment with LPDS/simvastatin).

      Weaknesses:

      I think there are no obvious weaknesses in this manuscript and overall, I believe the authors achieved their aims and have demonstrated the importance of cholesterol interactions on GLP-1R functioning in beta-cells. I think this paper will be of interest to many physiologists who may not be familiar with many of the techniques used in this paper and the authors largely do a good job explaining the goals of using each method in the results section.

      The intent of some methods, for example the Laurdan probe studies, are better expanded in the discussion.

      We have expanded on the rationale behind the use of Laurdan to assess behaviours of lipid packed membrane nanodomains in the methods, results and discussion of the revised manuscript.

      I found it unclear what exactly was being measured to assess 'receptor activity' in Fig 7E and F.

      Figures 7E and F refer to bystander complementation assays measuring the recruitment of nanobody 37 (Nb37)-SmBiT, which binds to active Gas, to either the plasma membrane (labelled with KRAS CAAX motif-LgBiT), or to endosomes (labelled with Endofin FYVE domain-LgBiT) in response to GLP-1R stimulation with exendin-4. This assay therefore measures GLP-1R activation specifically at each of these two subcellular locations. We have included a schematic of this assay in the new Supplementary Figure 3 to clarify the aim of these experiments.

      Certainly many follow-up experiments are possible from these initial findings and of primary interest is how this mutation affects insulin homeostasis in vivo under different physiological conditions. One of the biggest pathologies in insulin homeostasis in obesity/t2d is an elevation of baseline insulin release (as modeled in Fig 1E) that renders the fold-change in glucose stimulated insulin levels lower and physiologically less effective. No difference in primary mouse islet baseline insulin secretion was seen here but I wonder if this mutation would ameliorate diet-induced baseline hyperinsulinemia.

      We concur with the reviewer that it would be interesting to determine the effects of the GLP1R V229A mutation on insulin secretion responses under diet-induced metabolic stress conditions. While performing in vivo experiments on glucoregulation in mice harbouring the V229A mutation falls outside the scope of the present study, we have included ex vivo insulin secretion experiments in islets from GLP-1R KO mice transduced with adenoviruses expressing SNAP/FLAG-hGLP-1R WT or V229A and subsequently treated with vehicle versus MβCD loaded with 20 mM cholesterol to replicate the conditions of Figure 1E in the new Supplementary Figure 4.

      I would have liked to see the actual islet cholesterol content after 5wks high-cholesterol diet measured to correlate increased cholesterol load with diminished glucose-stimulated inulin. While not necessary for this paper, a comparison of islet cholesterol content after this cholesterol diet vs the more typical 60% HFD used in obesity research would be beneficial for GLP-1 physiology research broadly to take these findings into consideration with model choice.

      We have included these data in Supplementary Figure 1A.

      Another area to further investigate is does this mutation alter ex4 interaction/affinity/time of binding to GLP-1 or are all of the described findings due to changes in behavior and function of the receptor?

      To answer this question, have performed binding affinity experiments, which show no differences, in INS-1 832/3 SNAP/FLAG-hGLP-1R WT versus V229A cells (new Supplementary Figure 2D).

      Lastly, I wonder if V229A would have the same impact in a different cell type, especially in neurons? How similar are the cholesterol profiles of beta-cells and neurons? How this mutation (and future developed small molecules) may affect satiation, gut motility, and especially nausea, are of high translational interest. The comparison is drawn in the discussion between this mutation and ex4-phe1 to have biased agonism towards Gs over beta-arrestin signaling. Ex4-phe1 lowered pica behavior (a proxy for nausea) in the authors previously co-authored paper on ex4-phe1 (PMID 29686402) and I think drawing a parallel for this mutation or modification of cholesterol binding to potentially mitigate nausea is worth highlighting.

      While experiments in neurons are outside the scope of the present study, we have added this worthy point to the discussion and hypothesise on possible effects of GLP-1R mutants with modified cholesterol interactions on central GLP-1R actions in the revised manuscript.

      Reviewer #1 (Recommendations for the authors):

      There are no line numbers

      These have now been added.

      Abstract: "Cholesterol is a plasma membrane enriched lipid" - sorry for being finicky, but shouldn't this read; "a lipid often enriched in plasma membranes"

      We have modified the abstract to state that: “Cholesterol is a lipid enriched at the plasma membrane”.

      p. 4 "Moreover, islets extracted from high cholesterol-fed mice". How do you "extract islets"?

      We have exchanged the term “extracted” by “isolated”. Islet isolation is described in the paper methods section.

      p. 4 The sentence "These effects were accompanied by decreased GLP-1R plasma membrane diffusion under vehicle conditions, measured by Raster Image Correlation Spectroscopy (RICS) in rat insulinoma INS-1 832/3 cells with endogenous GLP-1R deleted [INS-1 832/3 GLP-1R KO cells (27)] stably expressing SNAP/FLAG-tagged human GLP-1R (SNAP/FLAG-hGLP-1R), an effect that is normally triggered by agonist binding (28), as also observed here (Supplementary Figure 1C, D)" is a masterpiece of complexity. Perhaps breaking up would facilitate reading?

      This paragraph has now been modified in the revised manuscript.

      p. 5. I cannot evaluate the "coarse grain molecular dynamics" studies.

      Reviewer #2 (Recommendations for the authors):

      I view this as an excellent manuscript with very comprehensive work and clear translational relevance. I don't think any further experiments are needed for the scope outlined in this manuscript. The discussion is already long but a short postulation on how this may translate to GLP-1R-cholesterol interactions in other cell types, specifically neurons with the intent on manipulating satiation and nausea, could be worthwhile.

      This has now been added.

      The only thing for readability I would suggest is a sentence in the results mentioning why you're doing the Laurdan analysis, and what is the output for assessing 'receptor activity' in the membrane and endosomes.

      Both points have now been added.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      The authors examine CD8 T cell selective pressure in early HCV infection using. They propose that after initial CD8-T mediated loss of virus fitness, in some participants around 3 months after infection, HCV acquires compensatory mutations and improved fitness leading to virus progression.

      Strengths:

      Throughout the paper, the authors apply well-established approaches in studies of acute to chronic HIV infection for studies of HCV infection. This lends rigor the to the authors' work.

      Weaknesses:

      (1) The Discussion could be strengthened by a direct discussion of the parallels/differences in results between HIV and HCV infections in terms of T cell selection, entropy, and fitness.

      We have added a direct discussion of the parallels/differences between HIV and HCV throughout the discussion including at lines 308 – 310 and 315 -327.

      Lines 308-310: “In fact, many parallels can be drawn between HIV infections and HCV infections in the context of emerging viral species that escape T cell immune responses.”

      Lines: 315-327: “One major difference between HCV and HIV infection is the event where patients infected with HCV have an approximately 25% chance to naturally clear the infection as opposed to just achieving viral control in HIV infections. Here, we probed the underlying mechanism, and questioned how the host immune response and HCV mutational landscape can allow the virus to escape the immune system. To understand this process, taking inspiration from HIV studies (24), a quantitative analysis of viral fitness relative to viral haplotypes was conducted using longitudinal samples to investigate whether a similar phenomenon was identified in HCV infections for our cohort for patients who progress to chronic infection. We observed a decrease in population average relative fitness in the period of <90DPI with respect to the T/F virus in chronic subjects infected with HCV. The decrease in fitness correlated positively with IFN-γ ELISPOT responses and negatively with SE indicating that CD8+ T-cell responses drove the rapid emergence of immune escape variants, which initially reduced viral fitness. This is similarly reflected in HIV infected patients where strong CD8+ T-cell responses drove quicker emergence of immune escape variants, often accompanied by compensatory mutations (24).”

      (2) In the Results, please describe the Barton model functionality and why the fitness landscape model was most applicable for studies of HCV viral diversity.

      This has been added to the introduction section rather than Results as we feel that it is more appropriate to show why it is most applicable to HCV viral diversity in the background section of the manuscript. We write at lines 77-90:

      “Barton et al.’s [23] approach to understand HIV mutational landscape resulting in immune escape had two fundamental points: 1) replicative fitness depends on the virus sequence and the requirement to consider the effect of co-occurring mutations, and 2) evolutionary dynamics (e.g. host immune pressure). Together they pave the way to predict the mutational space in which viral strains can change given the unique immune pressure exerted by individuals infected with HIV. This model fits well with the pathology of HCV infection. For instance, HIV and HCV are both RNA viruses with rapid rate of mutation. Additionally, like HIV, chronic infection is an outcome for HCV infected individuals, however, unlike HIV, there is a 25% probability that individuals infected with HCV will naturally clear the virus. Previously published studies [9] have shown that HIV also goes through a genetic bottleneck which results in the T/F virus losing dominance and replaced by a chronic subtype, identified by the immune escape mutations. The concepts in Barton’s model and its functionality to assess the fitness based on the complex interaction between viral sequence composition and host immune response is also applicable to early HCV infection.”

      (3) Recognize the caveats of the HCV mapping data presented.

      We have now recognized the caveats of the HCV mapping data at lines 354-256 “While our findings here are promising, it should be recognized that although the bioinformatics tool (iedb_tool.py) proved useful for identifying potential epitopes, there could be epitopes that are not predicted or false-positive from the output which could lead to missing real epitopes”

      (4) The authors should provide more data or cite publications to support the authors' statement that HCV-specific CD8 T cell responses decline following infection.

      We have now clarified at lines 352-353 that the decline was toward “selected epitopes that showed evidence of escape”.

      Furthermore, we have cited two publications at line 352 that support our statement.

      (5) Similarly, as the authors' measurements of HCV T and humoral responses were not exhaustive, the text describing the decline of T cells with the onset of humoral immunity needs caveats or more rigorous discussion with citations (Discussion lines 319-321).

      We have now added a caveat in the discussion at lines 357-360 which reads

      “In conclusion, this study provides initial insights into the evolutionary dynamics of HCV, showing that an early, robust CD8+ T-cell response without nAbs strongly selects against the T/F virus, enabling it to escape and establish chronic infection. However, these findings are preliminary and not exhaustive, warranting further investigation to fully understand these dynamics. “

      (6) What role does antigen drive play in these data -for both T can and antibody induction?

      It is possible that HLA-adapted mutations could limit CD8 T cell induction if the HLAs were matched between transmission pairs, as has been shown previously for HIV (https://doi.org/10.1371/journal.ppat.1008177) with some data for HCV (https://journals.asm.org/doi/10.1128/jvi.00912-06). However, we apologise as we are not entirely sure that this is what the reviewer is asking for in this instance.

      (7) Figure 3 - are the X and Y axes wrongly labelled? The Divergent ranges of population fitness do not make sense.

      Our apologies, there was an error with the plot in Figure 3 and the X and Y axis were wrongly labelled. This has now been resolved.

      (8) Figure S3 - is the green line, average virus fitness?

      This has now been clarified in Figure S3.

      (9) Use the term antibody epitopes, not B cell epitopes.

      We now use the term antibody epitopes throughout the manuscript.

      Reviewer #1 (Recommendations for the authors):

      Recommendations for improving the writing and presentation:

      (1) Introduction:

      Line 52: 'carry mutations B/T cell epitopes'. Two points

      i) These are antibody epitopes (and antibody selection) not B cell epitopes

      We have corrected this sentence at line 55 which now reads: “carry mutations within epitopes targeted by B cells and CD8+ T cells”.

      ii) To avoid confusion, add text that mutations were generated following selection in the donor.

      For HCV, it is unclear if mutations are generated following selection or have been occurring in low frequencies outside detection range. Only when selection by host immune pressure arises do the potentially low-frequency variants become dominant. However, we do acknowledge it is potentially misleading to only mention new variants replacing the transmitted/founder population. We have modified the sentence at line 52 to read:

      “At this stage either an existing variant that was occurring in low-frequency outside detection range or an existing variant with novel mutations generated following immune selection is observed in those who progress to chronic infection”

      - Lines 51-56: Human studies of escape and progression are associative, not causative as implied.

      Correct, evidence suggesting that escape and progression are currently associative. We have now corrected these lines to no longer suggest causation.

      - Line 65: Suggest you clarify your meaning of 'easier'?

      This sentence, now at line 72, has been modified to: “subtype 1b viruses have a higher probability to evade immune responses”

      (2) Results:

      - Line 147: Barton model (ref'd in Intro) is directly referred to here but not referenced.

      The reference has been added.

      - The authors should cite previous HIV literature describing associations between the rate of escape and Shannon Entropy e.g. the interaction between immunodominance, entropy, and rate of escape in acute HIV infection was described in Liu et al JCI 2013 but is not cited.

      We have now cited previous HIV research at line 147-151, adding Liu et al:

      “Additionally, the interaction between immunodominance, entropy, and escape rate in acute HIV infection has been described, where immunodominance during acute infection was the most significant factor influencing CD8+ T cell pressure, with higher immunodominance linked to faster escape (27). In contrast, lower epitope entropy slowed escape, and together, immunodominance and entropy explained half of the variability in escape timing (27).”

      - Line 319: The authors suggest that HCV-specific CD8 T cell response declines following early infection. On what are they basing this statement? The authors show their measured T cell responses decline but their approach uses selected epitopes and they are therefore unable to assess total HCV T cell response in participants (Where there is no escape, are T cell magnitudes maintained or do they still decline?). Can the authors cite other studies to support their statement?

      We have now clarified that the decline was toward “selected epitopes that showed evidence of escape”. Furthermore, we also cite two studies to support our findings.

      - Throughout the authors talk in terms of CD8 T cells but the ELISpot detects both CD4 and CD8 T cell responses. I suggest the authors be more explicit that their peptide design (9-10mers) is strongly biased to only the detection of CD8 T cells.

      To make this clearer and more explicit we have now added to the methods section at line 433-435:

      “While the ELISpot assay detects responses from both CD4 and CD8 T cells, our peptide design (9-10mers) is strongly biased toward CD8 T-cell detection. We have therefore interpreted ELISpot responses primarily in terms of CD8 T-cell activity.”

      - The points made in lines 307-321 could be more succinct

      We have now edited the discussion (lines 307 – 321) to make the points more succinct (now lines 307-323).

      Minor corrections to text, figures:

      - Figure 2: suggest making the Key bigger and more obvious.

      We have now made the key bigger and more obvious

      - Figure 3 A & D....is there an error on the X-axis...are you really reporting ELISpot data of < 1 spot/10^6? Perhaps the X and Y axes are wrongly labelled?

      Our apologies, there was an error with the plot in Figure 3 and the X and Y axis were wrongly labelled. This has now been resolved.

      - Figure 5: As this is PBMC, remove CD8 from the description of ELISpot. 

      We have now removed CD8 from the description of ELISpot in both Figure 5 and Figure S3

      Reviewer #2 (Public review):

      Summary:

      In this work, Walker and collaborators study the evolution of hepatitis C virus (HCV) in a cohort of 14 subjects with recent HCV infections. They focus in particular on the interplay between HCV and the immune system, including the accumulation of mutations in CD8+ T cell epitopes to evade immunity. Using a computational method to estimate the fitness effects of HCV mutations, they find that viral fitness declines as the virus mutates to escape T-cell responses. In long-term infections, they found that viral fitness can rebound later in infection as HCV accumulates additional mutations.

      Strengths:

      This work is especially interesting for several reasons. Individuals who developed chronic infections were followed over fairly long times and, in most cases, samples of the viral population were obtained frequently. At the same time, the authors also measured CD8+ T cell and antibody responses to infection. The analysis of HCV evolution focused not only on variation within particular CD8+ T cell epitopes but also on the surrounding proteins. Overall, this work is notable for integrating information about HCV sequence evolution, host immune responses, and computational metrics of fitness and sequence variation. The evidence presented by the authors supports the main conclusions of the paper described above.

      Weaknesses:

      One notable weakness of the present version of the manuscript is a lack of clarity in the description of the method of fitness estimation. In the previous studies of HIV and HCV cited by the authors, fitness models were derived by fitting the model (equation between lines 435 and 436) to viral sequence data collected from many different individuals. In the section "Estimating survival fitness of viral variants," it is not entirely clear if Walker and collaborators have used the same approach (i.e., fitting the model to viral sequences from many individuals), or whether they have used the sequence data from each individual to produce models that are specific to each subject. If it is the former, then the authors should describe where these sequences were obtained and the statistics of the data.

      If the fitness models were inferred based on the data from each subject, then more explanation is needed. In prior work, the use of these models to estimate fitness was justified by arguing that sequence variants common to many individuals are likely to be well-tolerated by the virus, while ones that are rare are likely to have high fitness costs. This justification is less clear for sequence variation within a single individual, where the viral population has had much less time to "explore" the sequence landscape. Nonetheless, there is precedent for this kind of analysis (see, e.g., Asti et al., PLoS Comput Biol 2016). If the authors took this approach, then this point should be discussed clearly and contrasted with the prior HIV and HCV studies.

      We thank the reviewer for pointing out the weakness in our explanation and description of the fitness model. The model has been generated using publicly released viral sequences and this has been described in a previous publication by Hart et al. 2015. T/F virus from each of the subjects chronically infected with HCV in our cohort were given to the model by Hart et al. to estimate the initial viral fitness of the T/F variant. Subsequent time points of each subject containing the subvariants of the viral population were also estimated using the same model (each subtype). For each subject, these subvariant viral fitness values were divided by the fitness value of the initial T/F virus (hence relative fitness of the earliest time points with no mutations in the epitope regions were a value of 1.000). All other fitness values are therefore relative fitness to the T/F variant.

      We have further clarified this point in the methods section “Estimating survival fitness of viral variant” to better describe how the data of the model was sourced (Lines 465-499).

      To add to the reviewer’s point, we agree that sequence variants common to many individuals are likely to be well-tolerated by the virus and this event was observed in our findings as our data suggested that immune escape variants tended to revert to variants that were closer the global consensus strain. Our previous publications have indicated that T/F viruses during transmission were variants that were “fit” for transmission between hosts, especially in cases where the donor was a chronic progressor, a single T/F is often observed. Progression to immune escape and adaptation to chronic infection in the new host has an in-between process of genetic expansion via replication followed by a bottleneck event under immune pressure where overall fitness (overall survivability including replication and exploring immune escape pathways) can change. Under this assumption we questioned whether the observation reported in HIV studies (i.e. mutation landscapes that allow HIV adaptation to host) also happens in HCV infections. Furthermore, cohort used in this study is a rare cohort where patients were tracked from uninfected, to HCV RNA+, to seroconversion and finally either clearing the virus or progression to chronic infection. Thus, it is of importance to understand the difference between clearance and chronic progression.

      Another important point for clarification is the definition of fitness. In the abstract, the authors note that multiple studies have shown that viral escape variants can have reduced fitness, "diminishing the survival of the viral strain within the host, and the capacity of the variant to survive future transmission events." It would be helpful to distinguish between this notion of fitness, which has sometimes been referred to as "intrinsic fitness," and a definition of fitness that describes the success of different viral strains within a particular individual, including the potential benefits of immune escape. In many cases, escape variants displace variants without escape mutations, showing that their ability to survive and replicate within a specific host is actually improved relative to variants without escape mutations. However, escape mutations may harm the virus's ability to replicate in other contexts. Given the major role that fitness plays in this paper, it would be helpful for readers to clearly discuss how fitness is defined and to distinguish between fitness within and between hosts (potentially also mentioning relevant concepts such as "transmission fitness," i.e., the relative ability of a particular variant to establish new infections).

      Thank you for pointing out the weakness of our definition of fitness. We have now clarified this at multiple sections of the paper: In the abstract at lines 18-21 and in the introduction at lines 64-69.

      These read:

      Lines 18-21: “However, this generic definition can be further divided into two categories where intrinsic fitness describes the viral fitness without the influence of any immune pressure and effective fitness considers both intrinsic fitness with the influence of host immune pressure.”

      Lines 64-69: “This generic definition of fitness can be further divided into intrinsic fitness (also referred to as replicative fitness), where the fitness of sequence composition of the variant is estimated without the influence of host immune pressure. On the other hand, effective fitness (from here on referred to as viral fitness) considers fundamental intrinsic fitness with host immune pressure acting as a selective force to direct mutational landscape (19)[REF], which subsequently influences future transmission events as it dictates which subvariants remain in the quasispecies.”

      One concern about the analysis is in the test of Shannon entropy as a way to quantify the rate of escape. The authors describe computing the entropy at multiple time points preceding the time when escape mutations were observed to fix in a particular epitope. Which entropy values were used to compare with the escape rate? If just the time point directly preceding the fixation of escape mutations, could escape mutations have already been present in the population at that time, increasing the entropy and thus drawing an association with the rate of escape? It would also be helpful for readers to include a definition of entropy in the methods, in addition to a reference to prior work. For example, it is not clear what is being averaged when "average SE" is described.

      We thank the reviewer to point out the ambiguity in describing average SE. This has been rectified by adding more information in the methods section (Lines 397 to 400):

      “Briefly, SE was calculated using the frequency of occurrence of SNPs based on per codon position, this was further normalized by the length of the number of codons in the sequence which made up respective protein. An average SE value was calculated for each time point in each protein region for all subjects until the fixation event.”

      To answer the reviewer’s question, we computed entropy at multiple time points preceding the observation in the escape mutation. The escape rate was calculated for the epitopes targeted by immune response. We compared the average SE based on change of each codon position and then normalised by protein length, where the region contained the epitope and the time it took to reach fixation. We observed that if the protein region had a higher rate of variation (i.e. higher average SE) then we also see a quicker emergence of an immune escape epitope. Since we took SE from the very first time point and all subsequent time points until fixation, we do not think that escape mutations already been present at the population would alter the findings of the association with rate of escape. Especially, these escape mutations were rarely observed at early time points. It is likely that due to host immune pressure that the escape variant could be observed, the SE therefore suggest the liberty of exploration in the mutation landscape. If the region was highly restrictive where any mutations would result in a failed variant, then we should observe relatively lower values of average SE. In other words, the higher variability that is allowed in the region, the greater the probability that it will find a solution to achieve immune escape.

      Reviewer #2 (Recommendations for the authors):

      In addition to the main points above, there are a few minor comments and suggestions about the presentation of the data.

      (1) It's not clear how, precisely, the model-based fitness has been calculated and normalized. It would be helpful for the authors to describe this explicitly. Especially in Figure 3, the plotted fitness values lie in dramatically different ranges, which should be explained (maybe this is just an error with the plot?).

      We have now clarified how the model-based fitness has been calculated and normalized in the method section “Estimating survival fitness of viral variants” at line 465-472.

      “The model used for estimating viral fitness has been previously described by Hart et al. (19). Briefly, the original approach used HCV subtype 1a sequences to generate the model for the NS5B protein region. To update the model for other regions (NS3 and NS2) as well as other HCV subtypes in this study, subtype 1b and subtype 3a sequences were extracted from the Los Almos National Laboratory HCV database. An intrinsic fitness model was first generated for each subtype for NS5B, NS3 and NS2 region of the HCV polyprotein. Then using, longitudinally sequenced data from patients chronically infected with HCV as well as clinically documented immune escape to describe high viral fitness variants, we generated estimates of the viral fitness for subjects chronically infected with HCV in our cohort.”

      Our apologies, there was an error with the plot in Figure 3. This has now been resolved.

      (2) In different plots, the authors show every pairwise comparison of ELISPOT values, population fitness, average SE, and rate of escape. It may be helpful to make one large matrix of plots that shows all of these pairwise comparisons at the same time. This could make it clear how all the variables are associated with one another. To be clear, this is a suggestion that the authors can consider at their discretion.

      Thank you for the suggestion to create a matrix of plots for pairwise comparisons. While this approach could indeed clarify variable associations, implementing it is outside the scope of this project. We appreciate the idea and may consider it in future studies as we continue to expand on this work.

    1. Author response:

      We have reviewed the helpful feedback from the reviewers and would like to thank them for their careful consideration of our manuscript. By way of provisional response, we agree with many of the above points and plan to revise our manuscript accordingly.

      In an effort to replicate some of the heme trafficking-related experiments in the original paper using a C. elegans model of TDD, we were either unable to do so or demonstrated an alternative explanation for the findings we could partially reproduce. As the reviewers correctly point out, there were some methodological and reagent-related differences between the study by Sun et al. and our own that we will more directly highlight in a subsequent manuscript version. Additionally, where possible, we will attempt to replicate these experiments using the same protocol(s).

      We observed several phenotypic traits observed in the C. elegans model of TDD that were not previously described in prior studies. While we believe these features to be consistent with a bioenergetic problem in the worm, direct evidence for this is admittedly lacking in our original manuscript. We are actively engaged in experiments examining potential functions of HRG-9 and HRG-10 unrelated to heme trafficking and will consider which data best aligns with the scope of this study, thus warranting inclusion in a subsequent manuscript version. We will also provide a more comprehensive review of relevant data generated by other groups (e.g., lipid dysregulation, impaired autophagy, mitochondrial dysfunction in the absence of TANGO2) in the discussion section.

      Recommended improvements related to figure legends, terminology, and formatting will also be executed in our forthcoming version. On behalf of my co-authors and myself, thank you again for your time and effort improving this work.

    1. Author response:

      We thank both reviewers for their time and effort in considering our manuscript. We are pleased that the reviewers recognised the strength of our theoretical analysis and found it "elegant" and "reasonably accessible". We also acknowledge the suggestions made by both reviewers that the manuscript could be improved by more discussion of potential experiments. We were concerned not to make the original manuscript too long but, in the light of the reviewers' comments, we will submit a revised version with more details of the kinds of experiments that would build on the results that we have presented.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The present study aims to associate reproduction with age-related disease as support of the antagonistic pleiotropy hypothesis of ageing, predominantly using Mendelian Randomization. The authors found evidence that early-life reproductive success is associated with advanced ageing.

      Strengths:

      Large sample size. Many analyses.

      Weaknesses:

      There are some errors in the methodology, that require revisions.

      In particular, the main conclusions drawn by the authors refer to the Mendelian Randomization analyses. However, the authors made a few errors here that need to be reconsidered:

      (1) Many of the outcomes investigated by the authors are continuous outcomes, while the authors report odds ratios. This is not correct and should be revised.

      Thank you for your observation. We have revised the manuscript to ensure that the results for continuous outcomes are appropriately reported using beta coefficients, which indicate the change in the outcome per unit increase in exposure. This will accurately reflect the nature of the analysis and provide a clearer interpretation of continuous outcomes (lines 56-109).

      (2) Some of the odds ratios (for example the one for osteoporosis) are really small, while still reaching the level of statistical significance. After some checking, I found the GWAS data used to generate these MR estimates were processed by the program BOLT-LLM. This program is a linear mixed model program, which requires the transformation of the beta estimates to be useful for dichotomous outcomes. The authors should check the manual of BOLT-LLM and recalculate the beta estimates of the SNP-outcome associations prior to the Mendelian Randomization analyses. This should be checked for all outcomes as it doesn't apply to all.

      Thank you for your detailed feedback. We have reviewed all the GWAS data used in our MR analyses and confirmed that all GWAS of continuous traits have already been processed using the BOLT-LMM, including age at menarche, age at first birth, BMI, frailty index, father's age at death, mother's age at death, DNA methylation GrimAge acceleration, age at menopause, eye age, and facial aging. Most of the dichotomous outcomes have not been processed by BOLT-LMM, including late-onset Alzheimer's disease, type 2 diabetes, chronic heart failure, essential hypertension, cirrhosis, chronic kidney disease, early onset chronic obstructive pulmonary disease, breast cancer, ovarian cancer, endometrial cancer, and cervical cancer, except osteoporosis. We have reprocessed the GWAS beta values of osteoporosis and re-conducted the MR analysis (lines 74-75; lines 366-373).

      (3) The authors should follow the MR-Strobe guidelines for presentation.

      Thank you for your suggestion to follow the MR-STROBE guidelines for the presentation of our study. We appreciate the importance of adhering to these standardized guidelines to ensure clarity and transparency in reporting Mendelian Randomization (MR) analyses. We confirm that the MR components of our research are structured and presented following the MR-STROBE checklist. In addition to the MR analyses, our study also integrates Colocalization analysis, Genetic correlation analysis, Ingenuity Pathway Analysis (IPA), and population validation to provide a more comprehensive understanding of the genetic and biological context. While these analyses are not strictly covered by MR-STROBE guidelines, they complement the MR results by offering additional validation and mechanistic insights.

      We have structured our manuscript to separate these complementary analyses from the core MR results, maintaining alignment with MR-STROBE for the MR-specific components. The additional analyses are discussed in dedicated sections to highlight their unique contributions and avoid conflating them with the MR findings.

      (4) The authors should report data in the text with a 95% confidence interval.

      Thank you for your feedback. We have added the 95% confidence intervals for the reported data within the main text to enhance clarity and provide comprehensive context (lines 56-109). Additionally, the complete analysis data, including all detailed results, can be found in Table S3.

      (5) The authors should consider correction for multiple testing

      Thank you for your comment regarding the need to consider correction for multiple testing. We agree that correcting for multiple comparisons is an important step to control for the possibility of false-positive findings, particularly in studies involving large numbers of statistical tests. In our study, we carefully considered the issue of multiple testing and adopted the following approach:

      Context of Multiple Testing: The tests we conducted were hypothesis-driven, focusing on specific relationships (e.g., genetic correlation, colocalization, and Mendelian Randomization). These analyses are based on priori hypotheses supported by existing literature or biological relevance.

      Statistical Methods: Where applicable, we applied appropriate measures to account for multiple tests. For instance, in Mendelian Randomization, sensitivity analyses serve to validate the robustness of the results.

      We believe that the methodology and corrections applied in our study appropriately address concerns about multiple testing, given the hypothesis-driven nature of our analyses and the rigorous steps taken to validate our findings. If you feel that additional corrections are required for specific parts of the analysis, we would be happy to further clarify or revise as needed.

      Reviewer #2 (Public review):

      Summary:

      The authors present an interesting paper where they test the antagonistic pleiotropy theory. Based on this theory they hypothesize that genetic variants associated with later onset of age at menarche and age at first birth have a positive causal effect on a multitude of health outcomes later in life, such as epigenetic aging and prevalence of chronic diseases. Using a mendelian randomization and colocalization approach, the authors show that SNPs associated with later age at menarche are associated with delayed aging measurements, such as slower epigenetic aging and reduced facial aging, and a lower risk of chronic diseases, such as type 2 diabetes and hypertension. Moreover, they identified 128 fertility-related SNPs that are associated with age-related outcomes and they identified BMI as a mediating factor for disease risk, discussing this finding in the context of evolutionary theory.

      Strengths:

      The major strength of this manuscript is that it addresses the antagonistic pleiotropy theory in aging. Aging theories are not frequently empirically tested although this is highly necessary. The work is therefore relevant for the aging field as well as beyond this field, as the antagonistic pleiotropy theory addresses the link between fitness (early life health and reproduction) and aging.

      Points that have to be clarified/addressed:

      (1) The antagonistic pleiotropy is an evolutionary theory pointing to the possibility that mutations that are beneficial for fitness (early life health and reproduction) may be detrimental later in life. As it concerns an evolutionary process and the authors focus on contemporary data from a single generation, more context is necessary on how this theory is accurately testable. For example, why and how much natural variation is there for fitness outcomes in humans?

      Thank you for these insightful questions. We appreciate the opportunity to clarify how we approach the testing of AP theory within a contemporary human cohort and address the evolutionary context and comparative considerations with the disposable soma theory.

      We recognize that modern human populations experience selection pressures that differ from those in the past, which may affect how well certain genetic variants reflect historical fitness benefits. Nonetheless, the genetic variation present today still offers valuable insights into potential AP mechanisms through statistical associations in contemporary cohorts. We believe that AP can indeed be explored in current populations by examining genetic links between reproductive traits and age-related health outcomes. In our study, we investigate whether certain genetic variants linked to reproductive timing—such as age at menarche and age at first birth—also correlate with late-life health risks. By identifying SNPs associated with both early-life reproductive success and adverse aging outcomes, we aim to capture the evolutionary trade-offs that AP theory suggests.

      Despite contemporary selection pressures that differ from historical conditions, there remains natural genetic variation in traits like reproductive timing and longevity in humans today. This diversity allows us to apply MR to test causal relationships between reproductive traits and aging outcomes, providing insights into potential AP mechanisms. Prior studies have demonstrated that reproductive behaviors exhibit significant heritability and have identified genetic loci associated with reproductive timing (1,2). This genetic variation facilitates causal inference in modern cohorts, despite environmental and healthcare advances that might modulate these associations (3). By leveraging genetic risk scores for reproductive timing, our study captures the necessary variability to assess potential AP effects, thus providing valuable insights into how evolutionary trade-offs may continue to influence human health outcomes.

      How do genetic risk score distributions of the exposure data look like?

      Thank you for your question. Our study is focused on Mendelian Randomization (MR) analysis, which aims to infer causal relationships between exposures and outcomes. While genetic risk scores (GRS) provide valuable insights at an individual level, they do not directly align with our study's objective, which is centered on population-level causal inference rather than individual-level genetic risk assessment. In MR, we use genetic variants as instrumental variables to determine the causal effect of an exposure on an outcome. GRS analysis typically focuses on summarizing an individual's risk based on multiple genetic variants, which is outside the scope of our current research. Therefore, we did not perform or analyze the distribution of genetic risk scores, as our primary goal was to understand broader causal relationships using established genetic instruments.

      Also, how can the authors distinguish in their data between the antagonistic pleiotropy theory and the disposable soma theory, which considers a trade-off between investment in reproduction and somatic maintenance and can be used to derive similar hypotheses? There is just a very brief mention of the disposable soma theory in lines 196-198.

      In our manuscript, we test AP theory specifically by examining genetic variants associated with reproductive timing and their association with age-related health risks in later life. MR and genetic risk scores allow us to assess these associations, directly testing the hypothesis that certain alleles enhancing reproductive success might have adverse effects on aging outcomes. This gene-centered approach aligns with AP’s premise of genetic trade-offs, enabling us to observe whether alleles associated with early-life reproductive traits correlate with increased risks of age-related diseases. Distinguishing from disposable soma theory, which would predict a general trade-off in energy allocation affecting somatic maintenance and not specific genetic effects, our data focuses on how certain alleles have differential impacts across life stages. Our findings thus support AP theory over disposable soma by highlighting the effects of specific genetic loci on both reproductive and aging phenotypes. However, future research could indeed explore the intersection of these theories, for example, by examining how resource allocation and genetic predispositions interact to influence longevity in various environmental contexts.

      (2) The antagonistic pleiotropy theory, used to derive the hypothesis, does not necessarily distinguish between male and female fitness. Would the authors expect that their results extrapolate to males as well? And can they test that?

      Emerging evidence suggests that early puberty in males is linked to adverse health outcomes, such as an increased risk of cardiovascular disease, type 2 diabetes, and hypertension in later life (4). A Mendelian randomization study also reported a genetic association between the timing of male puberty and reduced lifespan (5). These findings support the hypothesis that genetic variants associated with delayed reproductive timing in males might similarly confer health benefits or improved longevity, akin to the patterns observed in females. This would suggest that similar mechanisms of antagonistic pleiotropy could operate in males as well.

      In our study, BMI was identified as a mediator between reproductive timing and disease risk. Given that BMI is a common risk factor for age-related diseases in both males and females (6-9), it is plausible that similar mechanisms involving BMI, reproductive timing, and disease risk could exist in males. This shared mediator points to the possibility that, while reproductive timelines may differ, the pathways through which these traits influence aging outcomes may be consistent across genders.

      AP theory could potentially be tested in males, as the principles of the theory may extend to analogous reproductive traits in males, such as age at puberty and testosterone levels, which could similarly influence health outcomes later in life. However, as our current study focuses specifically on female reproductive traits, testing the AP theory in males is outside the scope of this work. We acknowledge the importance of exploring these mechanisms in males, and we hope that future research will address this by investigating male-specific reproductive traits and their relationship to aging and health outcomes.

      (3) There is no statistical analyses section providing the exact equations that are tested. Hence it's not clear how many tests were performed and if correction for multiple testing is necessary. It is also not clear what type of analyses have been done and why they have been done. For example in the section starting at line 47, Odds Ratios are presented, indicating that logistic regression analyses have been performed. As it's not clear how the outcomes are defined (genotype or phenotype, cross-sectional or longitudinal, etc.) it's also not clear why logistic regression analysis was used for the analyses.

      Thank you for your thoughtful comments regarding the statistical analyses and the clarification of methods and variables used in the study.

      Statistical Analyses Section: We have included a detailed explanation of all statistical analyses in the Methods section (lines 291–408), specifying the rationale for the choice of methods, the variables analyzed, and their relationships. Additionally, we have provided the relevant equations or statistical models used where appropriate to ensure transparency.

      Beta Values and Odds Ratios: In the Results section (starting at line 56), both Beta values and Odds Ratios are presented: Beta values were used for analyses of continuous outcomes to quantify the linear relationship between predictors and outcomes. Odds Ratios (ORs) were calculated for binary or categorical disease outcomes to describe the relative odds of an outcome given specific exposures or independent variables.

      Validation and Regression Analyses: For further validation of the MR results, we conducted analyses using the UK Biobank dataset (starting at line 162). Logistic regression analysis was then employed for disease risk assessments involving categorical outcomes (e.g., diseased or not).

      We hope that this clarifies the methods and their applicability to our study, as well as the rationale for the presentation of Beta values and Odds Ratios. If further details or refinements are required, we are happy to incorporate them.

      (4) Mendelian Randomization is an important part of the analyses done in the manuscript. It is not clear to what extent the MR assumptions are met, how the assumptions were tested, and if/what sensitivity analyses are performed; e.g. reverse MR, biological knowledge of the studied traits, etc. Can the authors explain to what extent the genetic instruments represent their targets (applicable expression/protein levels) well?

      Thank you for your insightful comments regarding the Mendelian Randomization (MR) analysis and the evaluation of its assumptions. Below, we provide additional clarification on how the MR assumptions were addressed, sensitivity analyses performed, and the representativeness of the genetic instruments (starting at line 314):

      Relevance Assumption (Genetic instruments are associated with the exposure): “We identified single nucleotide polymorphisms (SNPs) associated with exposure datasets with p < 5 × 10<sup>-8</sup> (10,11). In this case, 249 SNPs and 67 SNPs were selected as eligible instrumental variables (IVs) for exposures of age at menarche and age at first birth, respectively. All selected SNPs for every exposure would be clumped to avoid the linkage disequilibrium (r<sup>2</sup> = 0.001 and kb = 10,000).” “During the harmonization process, we aligned the alleles to the human genome reference sequence and removed incompatible SNPs. Subsequent analyses were based on the merged exposure-outcome dataset. We calculated the F statistics to quantify the strength of IVs for each exposure with a threshold of F>10 (12).”

      Independence Assumption (Genetic instruments are not associated with confounders, Genetic instruments affect the outcome only through the exposure): Then we identified whether there were potential confounders of IVs associated with the outcomes based on a database of human genotype-phenotype associations, PhenoScanner V2 (13,14) (http://www.phenoscanner.medschl.cam.ac.uk/), with a threshold of p < 1 × 10<sup>-5</sup>. IVs associated with education, smoking, alcohol, activity, and other confounders related to outcomes would be excluded.

      Sensitivity Analyses Performed: A pleiotropy test was used to check if the IVs influence the outcome through pathways other than the exposure of interest. A heterogeneity test was applied to ensure whether there is a variation in the causal effect estimates across different IVs. Significant heterogeneity test results indicate that some instruments are invalid or that the causal effect varies depending on the IVs used. MRPRESSO was applied to detect and correct potential outliers of IVs with NbDistribution = 10,000 and threshold p = 0.05. Outliers would be excluded for repeated analysis. The causal estimates were given as odds ratios (ORs) and 95% confidence intervals (CI). A leave-one-out analysis was conducted to ensure the robustness of the results by sequentially excluding each IV and confirming the direction and statistical significance of the remained remaining SNPs.

      Supplemental post-GWAS analysis: Colocalization analysis (starting at line 356), Genetic correlation analysis (starting at line 366).

      Our MR analysis adheres to the guidelines for causal inference in MR studies. By combining multiple sensitivity analyses and ensuring the quality of genetic instruments, we demonstrate that the results are robust and unlikely to be driven by confounding or pleiotropy.

      (5) It is not clear what reference genome is used and if or what imputation panel is used. It is also not clear what QC steps are applied to the genotype data in order to construct the genetic instruments of MR.

      Starting in line 314, the steps of SNPs selection were included in the Methods part. “We identified single nucleotide polymorphisms (SNPs) associated with exposure datasets with p < 5 × 10<sup>-8</sup> (10,11). In this case, 249 SNPs and 67 SNPs were selected as eligible instrumental variables (IVs) for exposures of age at menarche and age at first birth, respectively. All selected SNPs for every exposure would be clumped to avoid the linkage disequilibrium (r<sup>2</sup> = 0.001 and kb = 10,000). Then we identified whether there were potential confounders of IVs associated with the outcomes based on a database of human genotype-phenotype associations, PhenoScanner V2 (13,14) (http://www.phenoscanner.medschl.cam.ac.uk/), with a threshold of p < 1 × 10<sup>-5</sup>. IVs associated with education, smoking, alcohol, activity, and other confounders related to outcomes would be excluded. During the harmonization process, we aligned the alleles to the human genome reference sequence and removed incompatible SNPs. Subsequent analyses were based on the merged exposure-outcome dataset. We calculated the F statistics to quantify the strength of IVs for each exposure with a threshold of F>10 (12). If the effect allele frequency (EAF) was missing in the primary dataset, EAF would be collected from dsSNP (https://www.ncbi.nlm.nih.gov/snp/) based on the population to calculate the F value.” The SNP numbers of exposures for each outcome and F statistics results were listed in supplemental table S2.

      (6) A code availability statement is missing. It is understandable that data cannot always be shared, but code should be openly accessible.

      We have added it to the manuscript (starting at line 410).

      Reviewer #2 (Recommendations for the authors):

      (1) The outcomes seem to be genotypes (lines 274-288). In MR, genotypes are used as an instrument, representing an exposure, which is then associated with an outcome that is typically observed and measured at a later moment in time than the predictors. If both exposure and outcome are genotypes it is not clear how this works in terms of causality; it would rather reflect a genetic correlation. One would expect the genotypes that function as instruments for the exposure to have a functional cascade of (age-related) effects, leading to an (age-related) outcome. From line 149 the outcomes seem to be phenotypes. Can the authors please clearly explain in each section what is analyzed, how the analyses were done, and why the analyses were done that way?

      Thank you for your insightful comment. We understand the concern regarding the use of genotypes as both exposures and outcomes and the implications this has for interpreting causality versus genetic correlation. To clarify, in our study, the outcomes analyzed in the MR framework are indeed genotypes, starting from line 47. We use genotypes as instrumental variables for exposures, which are then linked to phenotypic outcomes observed at a later stage, in line with standard MR principles.

      To improve the robustness of the MR results, we validated the genetic associations in the population with phenotype data from UK Biobank (lines 162-203), and the detailed methods were listed in lines 385-408.

      (2) Overall, the English writing is good. However, some small errors slipped in. Please check the manuscript for small grammar mistakes like in sentences 10 (punctuation) and 33 (grammar).

      Thank you for your feedback. We appreciate your careful review and attention to detail. We thoroughly rechecked the manuscript for any grammatical errors, including punctuation and sentence structure, especially in sentences 11 and 35 in revised manuscript, as suggested.

      (3) There is currently no results and discussion section.

      The manuscript was submitted as Short Reports article type with a combined Results and Discussion section. We have added the section title of Discussion.

      (4) Why did the authors not include SNPs associated with age at menopausal onset? See for example: https://www.nature.com/articles/s41586-021-03779-7https://urldefense.com/v3/__https://www.nature.com/articles/s41586-021-03779-7__;!!HYjtAOY1tjP_!Kl_ZKCmWOQEnvEbl46TG0TuhlsxapwvFdAFfZJkMvz8z7XhX5VEA1cT8CVvNu8xrv9k679Kl0XTrxwSajUeiXWm04XP4$.

      Thank you for your information. Our manuscript focuses on the antagonistic pleiotropy theory, which posits that inherent trade-off in natural selection, where genes beneficial for early survival and reproduction (like menarche and childbirth) may have costly consequences later. So, we only included age at menarche and age at first childbirth as exposures in our research.

      (5) Can the authors include genetic correlations between menarche, age at first child, BMI, and preferably menopause?

      Thank you for your suggestion. We acknowledge that including genetic correlations between age at menarche, age at first childbirth, BMI, and menopause can provide valuable context to our analysis. While our current MR study sets age at menarche and age at first childbirth as exposures and menopause as the outcome, and we have already included results that account for BMI-related SNPs before and after correction, we recognize the importance of assessing genetic correlations.

      To address this, we calculated the genetic correlations between these traits to provide insight into their shared genetic architecture. This analysis helps clarify whether there is a significant genetic overlap between the two exposures and between exposure and outcome, which can inform and support the interpretation of our MR results. We appreciate your suggestion and include these calculations to enhance the robustness and comprehensiveness of our study. In the genetic correlations analysis, LDSC software was applied and the genetic correlation values for all pairwise comparisons among age at menarche, age at first birth, BMI, and age at menopause onset were calculated(15,16). The results are listed in Table S6.

      (6) Line 39-40: that is not entirely true. There is also amounting evidence that socioeconomic factors cause earlier onset of menarche through stress-related mechanisms: https://doi.org/10.1016/j.annepidem.2010.08.006https://urldefense.com/v3/__https://doi.org/10.1016/j.annepidem.2010.08.006__;!!HYjtAOY1tjP_!Kl_ZKCmWOQEnvEbl46TG0TuhlsxapwvFdAFfZJkMvz8z7XhX5VEA1cT8CVvNu8xrv9k679Kl0XTrxwSajUeiXZ4vbX0y$

      Thank you so much for your information. We changed it to “Considering reproductive events are partly regulated by genetic factors that can manifest the physiological outcome later in life”.

      (7) Why did the authors choose to work with studies derived from IEU Open GWAS? as it is often does not contain the most recent and relevant GWAS for a specific trait.

      We chose to work with studies derived from the IEU Open GWAS database after careful consideration of several sources, including the GWAS Catalog database and recently published GWAS papers. Our selection criteria focused on publicly available GWAS with large sample sizes and a higher number of SNPs to ensure robust analysis. For specific traits such as late-onset Alzheimer's disease and eye aging, we used GWAS data published in scientific articles to ensure that our research reflects the latest findings in the field.

      (1) Barban, N. et al. Genome-wide analysis identifies 12 loci influencing human reproductive behavior. Nat Genet 48, 1462-1472 (2016). https://doi.org/10.1038/ng.3698

      (2) Tropf, F. C. et al. Hidden heritability due to heterogeneity across seven populations. Nat Hum Behav 1, 757-765 (2017). https://doi.org/10.1038/s41562-017-0195-1

      (3) Stearns, S. C., Byars, S. G., Govindaraju, D. R. & Ewbank, D. Measuring selection in contemporary human populations. Nat Rev Genet 11, 611-622 (2010). https://doi.org/10.1038/nrg2831

      (4) Day, F. R., Elks, C. E., Murray, A., Ong, K. K. & Perry, J. R. Puberty timing associated with diabetes, cardiovascular disease and also diverse health outcomes in men and women: the UK Biobank study. Sci Rep 5, 11208 (2015). https://doi.org/10.1038/srep11208

      (5) Hollis, B. et al. Genomic analysis of male puberty timing highlights shared genetic basis with hair colour and lifespan. Nat Commun 11, 1536 (2020). https://doi.org/10.1038/s41467-020-14451-5

      (6) Field, A. E. et al. Impact of overweight on the risk of developing common chronic diseases during a 10-year period. Arch Intern Med 161, 1581-1586 (2001). https://doi.org/10.1001/archinte.161.13.1581

      (7) Singh, G. M. et al. The age-specific quantitative effects of metabolic risk factors on cardiovascular diseases and diabetes: a pooled analysis. PLoS One 8, e65174 (2013). https://doi.org/10.1371/journal.pone.0065174

      (8) Kivimaki, M. et al. Obesity and risk of diseases associated with hallmarks of cellular ageing: a multicohort study. Lancet Healthy Longev 5, e454-e463 (2024). https://doi.org/10.1016/S2666-7568(24)00087-4

      (9) Kivimaki, M. et al. Body-mass index and risk of obesity-related complex multimorbidity: an observational multicohort study. Lancet Diabetes Endocrinol 10, 253-263 (2022). https://doi.org/10.1016/S2213-8587(22)00033-X

      (10) Savage, J. E. et al. Genome-wide association meta-analysis in 269,867 individuals identifies new genetic and functional links to intelligence. Nat Genet 50, 912-919 (2018). https://doi.org/10.1038/s41588-018-0152-6

      (11) Gao, X. et al. The bidirectional causal relationships of insomnia with five major psychiatric disorders: A Mendelian randomization study. Eur Psychiatry 60, 79-85 (2019). https://doi.org/10.1016/j.eurpsy.2019.05.004

      (12) Burgess, S., Small, D. S. & Thompson, S. G. A review of instrumental variable estimators for Mendelian randomization. Stat Methods Med Res 26, 2333-2355 (2017). https://doi.org/10.1177/0962280215597579

      (13) Staley, J. R. et al. PhenoScanner: a database of human genotype-phenotype associations. Bioinformatics 32, 3207-3209 (2016). https://doi.org/10.1093/bioinformatics/btw373

      (14) Kamat, M. A. et al. PhenoScanner V2: an expanded tool for searching human genotype-phenotype associations. Bioinformatics 35, 4851-4853 (2019). https://doi.org/10.1093/bioinformatics/btz469

      (15) Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat Genet 47, 1236-1241 (2015). https://doi.org/10.1038/ng.3406

      (16) Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet 47, 291-295 (2015). https://doi.org/10.1038/ng.3211

    1. Author response:

      We thank the reviewers for their thoughtful comments and suggestions. We plan to make a number of revisions to the manuscript to address their feedback.

      Firstly, we plan to incorporate feedback related to our modeling approach. We will provide justification for the chosen models and why this dataset is not appropriate for an in-depth exploration of other models. In particular, we will highlight that the models included in this manuscript were taken from Langdon et al. (2019) with a minor extension. Model development and validation in the Langdon et al. (2019) paper required a dataset with >100 rats per task. As the current n per variant is 28-32, and behavioral performance on this task is highly variable, it would be difficult to sufficiently test the validity of models that majorly depart from the previously tested RL models. Nevertheless, we will acknowledge this as a limitation in the discussion section. Additionally, we will test some alternatives suggested by reviewers that fall within the scope of the current RL modeling framework (e.g., comparison to a standard delta-rule update for unrewarded choices). We will address other concerns brought up by reviewers by a.) providing a rationale for why we constrained our analyses to the first five sessions, b.) simulating data for sessions that match those that were analyzed in the real data (i.e., sessions 35-40 instead of 18-20), and c.) including a figure of the simulated choice probabilities rather than just risk score.

      Secondly, we will include additional analyses and clarify the current statistical approach to address comments on how the data were analyzed. We will include an analysis of task acquisition to investigate when choice preferences emerge across the different variants. We will justify the statistical approach used for detecting behavioral differences between task variants, including a better explanation of the inclusion of the risky/optimal label as a between-subjects factor in the ANOVAs. We will also expand the section on parameters predicting risk preference on the rGT to fully explain the statistical method used and provide a figure of the results.

      Lastly, we will provide a more detailed rationale for the reinforcer devaluation test, and describe the hypothesis it tests. We will also expand on how the results from the devaluation test support our conclusions, and address alternative explanations suggested by the reviewers.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1: 

      (1) As discussed in review and nicely simulated by the authors, the large figure error indicated by profilometry (~10 um in some cases on average) is inconsistent with the optical performance improvements observed, suggesting that those measurements are inaccurate.

      I see no reason to include these inaccurate measurements.  

      We agree with the Referee and removed the indicated figure (old Supplementary Fig. 4) and data.

      Reviewer #3:

      (1) It would be interesting to comment on how the addition of a coverslip changes the performance of the uncorrected microendoscope compared to the use of bare grin lenses. 

      We modified the discussion section (page 18) and added a new reference (#36) to include the request of the Referee.

      (2) In Figure 6C-H, the authors can indeed show data corresponding to all detected cells, but I still think that the statistics should be calculated using the same effective FOV. 

      We modified Figure 6 legend to include the request of the Referee.

      (3) Authors could present the images in Figures 4-6 as in the original version, with a scale bar in the centre of the FOV that is different for the two types of objectives (corrected vs uncorrected). They could add a short justification for this choice, and perhaps present the other version for Figure 4 in a supplementary information sheet (with similar scale bars at the centre of the FOV for both types of objectives). It would allow readers to appreciate that the FOV still appears significantly enlarged with this other presentation.

      As requested by the Referee, we modified the text in the Result section (page 11) and added the additional version of Figure 4 as Figure 4-figure supplement 1.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This study presents potentially valuable insights into the role of climbing fibers in cerebellar learning. The main claim is that climbing fiber activity is necessary for optokinetic reflex adaptation, but is dispensable for its long-term consolidation. There is evidence to support the first part of this claim, though it requires a clearer demonstration of the penetrance and selectivity of the manipulation. However, support for the latter part of the claim is incomplete owing to methodological concerns, including unclear efficacy of longer-duration climbing fiber activity suppression.

      We sincerely appreciate the thoughtful feedback provided by the reviewer regarding our study on the role of climbing fibers in cerebellar learning. Each point raised has been carefully considered, and we are committed to addressing them comprehensively. We acknowledge the importance of addressing methodological concerns, particularly regarding the efficacy of long-term suppression of CF activity, as well as ensuring clarity regarding the penetrance and selectivity of our manipulation. To this end, we have outlined plans for substantial revisions to the manuscript to adequately address these issues.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study by Seo et al highlights knowledge gaps regarding the role of cerebellar complex spike (CS) activity during different phases of learning related to optokinetic reflex (OKR) in mice. The novelty of the approach is twofold: first, specifically perturbing the activity of climbing fibers (CFs) in the flocculus (as opposed to disrupting communication between the inferior olive (IO) and its cerebellar targets globally); and second, examining whether disruption of the CS activity during the putative "consolidation phase" following training affects OKR performance.

      The first part of the results provides adequate evidence supporting the notion that optogenetic disruption of normal CF-Purkinje neuron (PN) signaling results in the degradation of OKR performance. As no effects are seen in OKR performance in animals subjected to optogenetic irradiation during the memory consolidation or retrieval phases, the authors conclude that CF function is not essential beyond memory acquisition. However, the manuscript does not provide a sufficiently solid demonstration that their longterm activity manipulation of CF activity is effective, thus undermining the confidence of the conclusions.

      Strengths:

      The main strength of the work is the aim to examine the specific involvement of the CF activity in the flocculus during distinct phases of learning. This is a challenging goal, due to the technical challenges related to the anatomical location of the flocculus as well as the IO. These obstacles are counterbalanced by the use of a well-established and easy-to-analyse behavioral model (OKR), that can lead to fundamental insights regarding the long-term cerebellar learning process.

      Weaknesses:

      The impact of the work is diminshed by several methodological shortcomings.

      Most importantly, the key finding that prolonged optogenetic inhibition of CFs (for 30 min to 6 hours after the training period) must be complemented by the demonstration that the manipulation maintains its efficacy. In its current form, the authors only show inhibition by short-term optogenetic irradiation in the context of electrical-stimulation-evoked CSs in an ex vivo preparation. As the inhibitory effect of even the eNpHR3.0 is greatly diminished during seconds-long stimulations (especially when using the yellow laser as is done in this work (see Zhang, Chuanqiang, et al. "Optimized photo-stimulation of halorhodopsin for long-term neuronal inhibition." BMC biology 17.1 (2019): 1-17. ), we remain skeptical of the extent of inhibition during the long manipulations. In short, without a demonstration of effective inhibition throughout the putative consolidation phase (for example by showing a significant decrease in CS frequency throughout the irradiation period), the main claim of the manuscript of phase-specific involvement of CF activity in OKR learning cannot be considered to be based on evidence.

      Second, the choice of viral targeting strategy leaves gaps in the argument for CF-specific mechanisms. CaMKII promoters are not selective for the IO neurons, and even the most precise viral injections always lead to the transfection of neurons in the surrounding brainstem, many of which project to the cerebellar cortex in the form of mossy fibers (MF). Figure 1Bii shows sparsely-labelled CFs in the flocculus, but possibly also MFs. While obtaining homogenous and strong labeling in all floccular CFs might be impossible, at the very least the authors should demonstrate that their optogenetic manipulation does not affect simple spiking in PNs.

      Finally, while the paper explicitly focuses on the effects of CF-evoked complex spikes in the PNs and not, for example, on those mediated by molecular layer interneurons or via direct interaction of the CF with vestibular nuclear neurons, it would be best if these other dimensions of CF involvement in cerebellar learning were candidly discussed.

      We appreciate the reviewer’s thorough evaluation, which thoughtfully highlights the strengths and areas for improvement in our study.

      We agree with the reviewer’s recognition of the novelty of our approach, particularly in specifically perturbing climbing fiber (CF) activity in the flocculus and examining its effects across distinct phases of learning. Additionally, our use of the well-established OKR behavior paradigm provides a robust framework for investigating cerebellar learning processes, further strengthening our study.

      To address concerns regarding the efficacy of long-term optogenetic inhibition and the specificity of viral targeting, we conducted additional experiments. These include in vivo monitoring of CF activity during the irradiation period, confirming sustained inhibition of complex spikes throughout the consolidation phase. To ensure precise targeting and mitigate potential side effects, such as unintended modification of Purkinje cell (PC) simple spike activity, we demonstrated that optogenetic suppression of CF transmission did not affect simple spike firing. Furthermore, we made additional characterizations to confirm the specificity of viral targeting.

      Lastly, we recognize the importance of exploring alternative mechanisms underlying CF involvement in cerebellar learning. Accordingly, we expanded the manuscript to provide a more comprehensive discussion of these mechanisms, offering a clearer perspective on the broader implications of our findings.

      Reviewer #2 (Public Review):

      Summary:

      The authors aimed to explore the role of climbing fibers (CFs) in cerebellar learning, with a focus on optokinetic reflex (OKR) adaptation. Their goal was to understand how CF activity influences memory acquisition, memory consolidation, and memory retrieval by optogenetically suppressing CF inputs at various stages of the learning process.

      Strengths:

      The study addresses a significant question in the cerebellar field by focusing on the specific role of CFs in adaptive learning. The authors use optogenetic tools to manipulate CF activity. This provides a direct method to test the causal relationship between CF activity and learning outcomes.

      Weaknesses:

      Despite shedding light on the potential role of CFs in cerebellar learning, the study is hampered by significant methodological issues that question the validity of its conclusions. The absence of detailed evidence on the effectiveness of CF suppression and concerns over tissue damage from optogenetic stimulation weakens the argument that CFs are not essential for memory consolidation. These challenges make it difficult to confirm whether the study's objectives were fully met or if the findings conclusively support the authors' claims. The research commendably attempts to unravel the temporal involvement of CFs in learning but also underscores the difficulties in pinpointing specific neural mechanisms that underlie the phases of learning. Addressing these methodological issues, investigating other signals that might instruct consolidation, and understanding CFs' broader impact on various learning behaviors are crucial steps for future studies.

      We appreciate the reviewer’s recognition of the significance of our study in addressing the fundamental question of the role of CF in adaptive learning within the cerebellar field. The use of optogenetic tools indeed provides a direct means to investigate the causal relationship between CF activity and learning outcomes.

      To address concerns regarding the effectiveness of CF suppression during consolidation, we plan to conduct further in-vivo recordings. These will demonstrate how reliably CF transmission can be suppressed through optogenetic manipulation over an extended period.

      In response to the concern about potential tissue damage from laser stimulation, we believe that our optogenetic manipulation was not strong enough to induce significant heat-induced tissue damage in the flocculus. According to Cardin et al. (2010), light applied through an optic fiber may cause critical damage if the intensity exceeds 100 mW, which is eight times stronger than the intensity we used in our OKR experiment. Furthermore, if there had been tissue damage from chronic laser stimulation, we would expect to see impaired long-term memory reflected in abnormal gain retrieval results tested the following day. However, as shown in Figures 2 and 3, there were no significant abnormalities in consolidation percentages even after the optogenetic manipulation.

      Finally, we appreciate the reviewer’s recognition of the challenges involved in pinpointing specific neural mechanisms. We plan to expand the discussion to address these complexities and outline future research directions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Inhibitory optogenetic actuators are generally problematic, especially in time frames longer than seconds. If the authors wish to be able to inhibit activity in the flocculus-targeting CFs for a long time, maybe it would make sense to try to retrogradely transfect the IO neurons from the flocculus (using a cre-lox approach) with inhibitory DREADDs. This approach is also full of problems, so the absence or significant decrease in CS activity throughout the period of manipulation must be demonstrated.

      In addition to re-examining the strength of the evidence regarding the role of CFs in the consolidation and retrival phases, the manuscript would benefit from significant reworking of the details in the manuscript and figures. Below is a possibly incomplete list of things we would want to highlight:

      (1) While the text states the authors "... verified the potential reduction of Cs firing rate in PCs of awake mice in vivo by inhibiting CF signals", the data nor a figure are shown. This is of critical importance when judging the reliability of the following results. The data presented in panels Figure 1D-E should also be improved to be more informative, specifically, the waveforms of EPSCs should be shown in higher resolution. We are not informed about how many cells/slices/animals the results are obtained from, nor how many trials were done per condition. Finally, the in vitro data is from vermal Purkinje neurons, while the focus of the work is in the flocculus. Please provide these verifications for the flocculus.

      To verify the suppression of complex spike (Cs) activity, we conducted additional in-vivo experiments and added Figure 2, which presents recordings of Cs firing rates from Purkinje cells (PCs) during optogenetic suppression of climbing fiber (CF) activity. These data demonstrate that the suppression specifically and robustly targets Cs activity without affecting simple spike firing, as shown in Figure 2C. The results presented in Figure 2 were acquired at 40 minutes of optostimulation, consistently showing effective suppression of Cs activity throughout this period. While continuous recordings over several hours were not performed, the stability and sustained suppression observed at the 40-minute mark strongly suggest that the manipulation remains effective during the extended durations required for the behavioral tests.

      Additionally, we have improved Figure 1D by enhancing the resolution of EPSC waveforms and including more detailed information in the figure legend regarding the number of cells and animals analyzed. For the current-clamp mode data (Figures 1E and F), we clarified the experimental conditions to provide additional context. While the in vitro data were collected from vermal PCs, these experiments were intended to illustrate the fundamental properties of CF-PC transmission.

      (2) It is challenging to get a homogenous transfection of all CFs in a given region. To be able to judge the significance of the results, the readers should be provided with material allowing assessing the transfection quality. The images shown in panels Bi-ii are spatially restricted and of too low quality to make judgements. Also, it is not stated whether the images shown are from GFP or NpHR-transfected animals. These different payloads are delivered using different viral capsids (AAV1 vs. AAV9) that have significantly different transfection capacities and results from AAV9-CamKIIGFP cannot be generalized to AAV1-CamKII-NpHR. Please show the expression for the capsid used with NpHR.

      To clarify, the images in Figure Bi-ii are representative of GFP expression in animals transfected using AAV1-CamKII-EGFP. The purpose of these panels is to confirm the successful targeting of the region of interest rather than to evaluate viral tropism or capsid-specific transfection efficiency. Moreover, while the transfection characteristics of AAV1 and AAV9 may differ, the key experimental parameter of effective CF suppression was validated through in-vivo electrophysiological recordings, which robustly confirm the efficacy of NpHR expression.

      (3) Finally, please show the location of the optic fiber implant in the flocculus from post-mortem images.

      In Figure 3a of our revised manuscript, we added post-mortem histological images showing the exact location of the optic fiber implants in the flocculus. These images provided clear confirmation that the optogenetic stimulation was targeted to the correct anatomical region, ensuring that the observed effects are attributable to CF manipulation in the flocculus.

      Reviewer #2 (Recommendations For The Authors):

      (1) The efficacy of CF suppression is questionable. The histology in Figure 1 shows that only a handful of CFs are transduced in their approach. This observation casts doubt on the claimed complete suppression of CF-evoked EPSCs in every recorded PC in the same figure. This necessitates a more detailed explanation for this apparent discrepancy. Also, the absence of current-clamp recordings to measure the effect on CF-evoked complex spiking in PCs and the lack of detail regarding the timing of optogenetic actuation (continuous or pulsed) during these slice experiments are also significant omissions.

      We are providing additional in vivo electrophysiological recordings showing sustained CF suppression in awake animals (Figure 2). These recordings will directly demonstrate the extent of CFevoked complex spike (Cs) suppression.

      Moreover, we have included additional data of current-clamp recordings to measure the impact of CF suppression on Cs activity (Figures 1E and 1F). Regarding the timing of the optogenetic actuation, the stimulation was applied continuously in the slice experiments.

      (2) The authors claim that their method effectively suppresses CF activity in vivo, yet they do not present any supporting data. Given the histological evidence provided, it's questionable whether their approach truly impacts the CF population broadly, casting doubts on the efficacy of their suppression approach to identify the role of CFs during behavior. To address these concerns, further experiments and detailed quantification are essential to validate the extent and uniformity of CF suppression achieved.

      As we responded earlier, we conducted additional in-vivo experiments with continuous recordings of CF-evoked complex spike (Cs) activity during optogenetic suppression (Figure 2). These data directly demonstrate effective and sustained inhibition of CF transmission throughout the behavioral experiments. Quantification of CF suppression revealed consistent inhibition across the manipulation period, with no observable alterations in Purkinje cell simple spike firing rates, confirming that our intervention specifically targeted CF activity without off-target effects. In addition to the in-vivo data, the in-vitro data presented in Figure 1 (lines 107~116) further validate the efficacy of our optogenetic manipulation, showing consistent suppression of CF transmission without any failures. These findings collectively confirm the reliability and specificity of our suppression approach for studying CF contributions to behavior.

      (3) To optogenetically test the role of CFs in memory consolidation, the authors deliver continuous, high-power light to the flocculus (13 mW for 6 hrs). This extends well beyond typical experimental conditions. The sustained nature of the light exposure thus brings into question the consistency and reliability of CF suppression over time. Firstly, it is imperative to determine whether CF activity is suppressed throughout this extended period. Secondly, the intensity and duration of light exposure carry a significant risk of causing extensive damage to the surrounding tissue. Given these concerns, a thorough histological examination is warranted to assess the potential adverse effects on tissue integrity. Such an analysis is crucial not only for validating the experimental outcomes but also for ensuring that the observed effects are not confounded by light-induced tissue damage.

      To address whether CF activity is suppressed throughout the extended period, we included new in-vivo recordings demonstrating robust suppression of CF transmission, as evidenced by inhibited complex spikes sustained at 40 minutes of optostimulation. Regarding potential tissue damage, our optogenetic protocol used a light intensity (13 mW), which is much lower than the 75 mW threshold reported by Cardin et al. (2010) as sufficient to maintain normal neuronal activity. Moreover, critical damage typically requires intensities exceeding 100 mW for several hours (Cardin, Jessica A., et al. "Targeted optogenetic stimulation and recording of neurons in vivo using cell-type-specific expression of Channelrhodopsin-2." Nature protocols 5.2 (2010): 247-254.). Finally, we observed no abnormalities in long-term memory consolidation or gain retrieval (Figures 3C, 4C, 4F), further supporting that our light stimulation did not induce tissue damage.

      (4) The generalizability of their findings to various learning behaviors remains uncertain. Given that the flocculus plays a role in vestibulo-ocular reflex (VOR) adaptation, which encompasses both CFdependent and CF-independent learning types (gain increase and gain decrease, respectively), this system could offer a more feasible approach for investigating hypotheses about the role of CFs in guiding distinct learning processes.

      In response to the reviewer’s comment on the generalizability of our findings to learning behaviors involving both CF-dependent and CF-independent mechanisms, we acknowledge the importance of examining these dynamics in cerebellar motor adaptation systems, such as the OKR. Although our study used an OKR task, findings from VOR studies apply here. Ke et al. (2009) demonstrated that VOR gain increases (CF-dependent) and gain decreases (CF-independent) involve distinct plasticity processes (Ke, Michael C., Cong C. Guo, and Jennifer L. Raymond. "Elimination of climbing fiber instructive signals during motor learning." Nature neuroscience 12.9 (2009): 1171-1179), suggesting that CF engagement is task-dependent, particularly for larger error signals that require CF-guided adaptation.

      Similarly, our OKR findings suggest that CF-dependent pathways are likely used for large, persistent errors, whereas CF-independent mechanisms may drive more gradual adjustments. This alignment between OKR and VOR systems supports the generalizability of CF-selective adaptation across cerebellar learning tasks. We have elaborated on this point in our revised manuscript (lines 219~237), clarifying how CF-dependent and CF-independent mechanisms can generalize across motor learning contexts in the cerebellum.

      (5) The acute effect of CF suppression on OKR eye movements warrants investigation. If OKR eye movements are altered by their method, this could complicate the interpretation of their results.

      During our experiments, we monitored ocular movements during CF optogenetic manipulation and found no aberrant effects, such as nystagmus. As shown in Figures 4G and 4H, disrupting CF signaling during gain retrieval did not alter the gain, confirming that our manipulation neither acutely affects ocular reflexes nor induces abnormal eye movement. Therefore, it leads to the conclusion that the observed effects are specific to learning and memory processes.

      (6) The authors raise the potential issue of inducing presynaptic LTD in CFs. Can they be sure that their manipulation doesn't generate a similar effect? Additional controls or techniques to accurately interpret the results are needed considering this concern.

      However, our discussion does not claim that optogenetic suppression directly induces CF-LTD. Instead, we posit that CF suppression may have mimicked the functional consequences of CFLTD, such as reduced complex spike (Cs) activity and associated calcium signaling. This, in turn, may have indirectly interfered with the induction of parallel fiber-Purkinje cell (PF-PC) LTD, thereby preventing gain enhancement during learning.

      This hypothesis is consistent with previous studies highlighting the interplay between CF and PF synaptic plasticity in cerebellar motor learning. For example, Hansel and Linden (2000) and Weber et al. (2003) discuss how changes at CF synapses can modulate Cs waveforms and calcium dynamics, which are critical for PF-PC LTD. Coesmans et al. (2004) and Han et al. (2007) further elaborate on the necessity of CF input for effective PF-PC LTD induction during learning tasks such as retinal slip correction.

      While our experiments were not designed to directly measure CF-LTD, the observed prevention of gain enhancement aligns with the hypothesis that CF suppression functionally disrupted downstream PF-PC LTD. We have clarified these points in our revised manuscript (lines 250~258) to avoid misunderstanding.

      (7) The specific timeframe for OKR consolidation remains uncertain, with evidence from numerous studies indicating that cerebellar memory consolidation unfolds over several days. Therefore, a more thorough investigation into these extended durations, supported by control experiments to validate the outcomes, would significantly strengthen the study's conclusions, and provide clearer insights into the consolidation process of OKR learning.

      Our current study specifically focused on the early phase of the post-learning period, as supported by findings from several studies: Cooke et al., (2004); Titley et al., (2007); Steinmetz et al., (2016); Seo et al., (2024)

      These studies collectively indicate that cerebellar-dependent memory consolidation—including OKR—can occur rapidly during the early consolidation phase. While the specific mechanisms examined in these studies vary (e.g., synaptic plasticity, intrinsic plasticity, or circuit-level changes), they consistently demonstrate that modifications in the cerebellum after the early consolidation period no longer influence memory storage or performance. This evidence strongly supports the relevance of our experimental focus and the timing of our interventions.

      We acknowledge the importance of investigating extended consolidation periods, which could indeed provide additional insights. However, given our current aims, the rapid consolidation dynamics observed in the early phase are most relevant to the questions addressed in this study. We have elaborated on these matter in our revised manuscript (lines 273~283).

      (8) Issues around whether the authors have control over CF activity with their optogenetic intervention raise questions of whether learning can be recovered during the training procedure if the optogenetic stimuli are halted. Specifically, if suppression is applied for three blocks (what the authors refer to as "sessions") during the training procedure and then ceases, does learning rapidly recover in the immediately following blocks?

      While we did not directly examine the restoration of learning capability within the same training session following the cessation of optogenetic inhibition, we believe several aspects of our experimental design and insights from prior studies support our interpretation.

      Our optogenetic intervention specifically targeted Purkinje cells (PCs) in the flocculus and was applied continuously during designated training sessions to modulate cerebellar activity. Notably, Medina et al. (2001) demonstrated that transient inactivation of the cerebellar cortex impairs the expression of learned responses but does not disrupt the underlying plasticity mechanisms (Medina, Javier F., Keith S. Garcia, and Michael D. Mauk. "A mechanism for savings in the cerebellum." Journal of Neuroscience 21.11 (2001): 4081-4089.). This finding suggests that cerebellar plasticity remains intact and functional even after transient perturbations.

      Therefore, it is plausible that once optogenetic inhibition is lifted, the cerebellar network regains its capacity for learning and adaptation, as the intrinsic plasticity and memory encoding processes remain preserved. While we acknowledge that direct experimental confirmation of rapid recovery in our setup was not performed, this interpretation is consistent with our experimental framework and the broader literature.

      (9) The study does not fully explore the instructive signals/mechanisms underlying the memory consolidation process. A detailed investigation into potential instructive signals for consolidation beyond CF-induced signaling, like the simple spiking of PCs, could significantly enhance the study's conclusions. Indeed, there is currently no evidence to suggest that CFs play a role in the consolidation phase anyway so testing their role seems a bit of a strawman argument.

      While our study primarily focused on characterizing CF-dependent pathways, we acknowledge that memory consolidation is likely driven by a multifaceted interplay of instructive signals beyond CF-induced mechanisms. In particular, Purkinje cell (PC) simple spiking may act as a critical signal during the consolidation phase, either complementing or functioning independently of CF input. Emerging evidence suggests that simple spiking can modulate downstream circuitry in ways that stabilize and strengthen memory traces.

      To address this, we have expanded the discussion in the revised manuscript to explore potential instructive signals for consolidation, including PC simple spiking, local circuit plasticity within the cerebellar cortex, and its interaction with the cerebellar nuclei. We propose that these mechanisms collectively contribute to the transfer and stabilization of motor memory, offering a more comprehensive framework for understanding consolidation. We have elaborated on these matter in our revised manuscript (lines 238~250).

      (10) Previous reports have highlighted the necessity of CF activity for extinction/memory maintenance (Medina et al. 2002; Kim et al. 2020). That is, the absence of CF activity is consequential for cerebellar function. These results present a potential contrast to the findings reported in this current study. This discrepancy raises important questions about the experimental conditions, methodologies, and interpretations of CF function across different studies. A thorough discussion comparing these divergent outcomes is essential, as it could elucidate the specific contexts or conditions under which CF activity influences memory processes.

      We acknowledge that previous studies (Medina et al., 2002; Kim et al., 2020) have suggested a role for climbing fiber (CF) activity in extinction. However, our study specifically focuses on the acquisition phase of motor learning and does not extend to extinction or maintenance. As such, we have revised our discussion to limit interpretations strictly to the scope of our findings and removed references to extinction.

      The discrepancies between our results and prior work may arise from differences in methodologies and behavioral paradigms. For instance, we utilized optogenetic inhibition to achieve precise temporal and spatial control of CF activity, whereas previous studies employed pharmacological or lesion methods that may have broader effects on the cerebellar circuitry. Additionally, differences in behavioral paradigms, such as the optokinetic reflex (OKR) task used in our study compared to the eye-blink conditioning tasks in prior studies, may demand distinct roles for CF signaling depending on the specific requirements for error correction and adaptation.

      This clarification is now incorporated into our revised manuscript, and the discussion has been streamlined to focus on the phase-specific role of CF activity during acquisition without extending to extinction or maintenance (lines 259~270).

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      The article emphasizes vocal social behavior but none of the experiments involve a social element. Marmosets are recorded in isolation which could be sufficient for examining the development of vocal behavior in that particular context. However, the early-life maturation of vocal behavior is strongly influenced by social interactions with conspecifics. For example, the transition of cries and subharmonic phees which are high-entropy calls to more low-entropy mature phees is affected by social reinforcement from the parents. And this effect extends cross context where differences in these interaction patterns extend to vocal behavior when the marmosets are alone. From the chord diagrams, cries still consist of a significant proportion of call types in lesioned animals. Additionally, though it is an intriguing finding that the infants' phee calls have acoustic differences being 'blunted of variation, less diverse and more regular,' the suggestion that the social message conveyed by these infants was 'deficient, limited, and/or indiscriminate' is not but can be tested with, for example, playback experiments.

      We recognize that our definition of vocal social behavior is not within the normal realm of direct social interactions. We were particularly interested in marmoset vocalizations as a social signal, such as phees, cries and twitter, even when their family members or conspecifics are not visibly present. Generally speaking, in the laboratory, infant marmosets make few calls when in the presence of another conspecific, but when isolated they naturally make phee calls to reach out to their distantly located relatives. In this context, while we did not assess the animals interacting directly, we assessed what are normally referred to as ‘social contact calls,’ hence the term ‘social vocalizations.’ Playback recordings might provide potential evidence of antiphonal calling as a means of social interaction and might reveal the poor quality of the social message conveyed by the infant, but even here, the vocalizing marmoset would be calling to a non-visible conspecific. Thus, although our experiment lacked a direct social element, our data suggest that in the absence of a functioning ACC in early life, infant calls that convey social information, and which would elicit feedback from parents and other family members, may be compromised, and this could potentially influence how that infant develops its social interactive skills. We have now commented on the significance of social vocalizations in the introductory text (page 3) and discussion (page 15).

      The manuscript would benefit from the addition of more details to be able to better determine if the conclusions are well supported by the data. Understanding that this is very difficult data to get, the number of marmosets and some variability in the collection of the data would allow for the plotting of each individual across figures. For example, in the behavioral figures, which is the marmoset that is in the behavioral data that has a sparing of the ACC lesion in one hemisphere? Certain figures, described below in the recommendations for the authors, could also do with additional description.

      Thanks for these suggestions. We have plotted the individual animals in the relevant figures and addressed the comments and recommendations listed below.

      Reviewer #1 (Recommendations For The Authors):

      Given the number of marmosets, variability in the collected data, lesion extent, and different controls, I would like to see more plots with individuals indicated (perhaps with different symbols). More details could also be added for several plots.

      Figure 2D (new) and 2E now have plots that represent the individual animals, each represented by a different symbol.

      Figure 2A) Since lesions are bilateral, could you also show the extent of the lesions on the other side for completeness?

      Our intention was to process one hemisphere of each brain for Golgi staining to examine changes in cell morphology in the ACC and associated brain regions following the lesion. Unfortunately, the Golgi stain was unsuccessful. Consequently, we were unable to use the tissue to reconstruct the bilateral extent of the lesion. We did, however, first establish the bilateral nature of the lesion through coronal slices of the animals MRI scan before processing the intact hemisphere to confirm the bilateral extent of the lesion. The MRI scans (every 5th section) for each control and lesioned animal is compiled in a figure in the supplementary materials (Fig. S1). These scans show that the ACC-lesioned animals have bilateral lesions with one animal (ACC1) showing some sparing in one hemisphere, as we noted in the text. We have now made reference to this supplemental figure in the text (page 5).

      Figure 2B/C) In Figure 2B, control and ACC lesions are in the columns while right next to it in 2C, ACC lesion and control are in the rows. Could these figures be adjusted so that they are consistent?

      We have now adjusted these figures and updated the figure legends accordingly.

      Figure 2C) Is there quantification of the 'loss of neurons and respective increase in glial cells at the lesioned site especially at the interface between gray and white matter'? There are multiple slices for each animal.

      Thanks for suggesting this. We have now quantified these data which are presented as a new graph as Fig. 2D. These data revealed a significant loss of neurons (NeuN) in the ACC group as well as an increase in glial cells (GFAP and Iba1) relative to the controls. The figure legend and results have also been updated.

      Figure 2C) It is difficult for me to distinguish between white and purple - could you show color channels independently since images were split into separate channels for each fluorophore?

      Fig. 2C has been revised to better visualize the neurons and glia at the gray and white matter interface. We found that grayscale images for each channel offered a better contrast than separating the channels for each fluorophore.

      Figure 2C/D) I like how there are individual dots here for the individual marmosets. Since there are four in each group, could they be represented throughout with symbols (with a key indicating the pair and also the control condition)? For example, were there changes in the histology for control animals that got saline injections as opposed to those that didn't get any surgery?

      We have highlighted the individual animals with different symbols in the figures. Although some animals were twin pairs, it was not possible to have twins in all cases. Only two sets were twins. We have indicated the symbols that represent the twin pair in Fig. 2 as well as the MRI scans of the twin pairs in Fig. S1. There were no observed changes in histology for the sham animals relative to the other non-sham controls. The MRI scan for one sham CON2 shows herniated tissue in the right hemisphere which is a normal consequence of brain exposure caused by a craniotomy.

      Figure 3D-E) Here, individual data points could be informative especially given that some animals are missing data past the third week.

      To prevent cluttering the figure with too many data points, we have added the sample size for each group in the figure legend (pages 33).

      Figure 3D/F) What exactly is the period that goes into this analysis? In the text, 'Further analysis showed that the ACC lesion had minimal effects on the rate of most call types during this period'. Is this period from weeks 3 to 6 relative to the proportions in week 2? I think I also don't quite understand the chord diagram. The legend says 'the numbers around each chord diagram represents relative probability value for each call type transition' so how does that relate to the proportion of these call types? It looks like there is a wider slice for cries for ACC-lesioned animals each week. I also don't see in the week 4 chord diagram, the text description of 'elevation in the rate of 'other' calls, which comprised tsik, egg, eck, chatter and seep calls. These calls were significantly elevated in animals after the ACC lesion."

      We apologize for the confusion. Fig 3D and Fig 3F are not directly related. Fig. 3D shows the different types of emitted calls. The figure shows the averaged data per group pooled from post-surgery weeks (week 3 – week 6). It represents the proportion of individual call types relative to the total number of calls during each recording period. The only major finding here was the increased rate of ‘other’ calls comprising tsik, egg, ock, chatter and seep calls. These calls were significantly elevated in animals after the ACC lesion.

      While Fig. 3D represents the differences in the proportion of calls, the chord diagrams in Fig. 3F represents the probability of call-to-call transition obtained from a probability matrix. At postnatal week 6, marmosets with ACC lesions showed a higher likelihood of transitions between all call types, but less frequent transitions between social contact calls relative to sham controls. The chord diagrams visualize the weighted probabilities and directionality of these transitions between the different call types. Weighted probabilities were used to account for variations in call counts. The thickness of the arrows or links indicates the probability of a call transition, while the numbers surrounding each chord diagram represent the relative probability value for each specific transition. We have now reworded the text and clarified these details in the figure legend (pages 32-33).

      Figure 3E) How is the ratio on the y-axis calculated here?

      The y-axis represents the averaged value of the ratios of the number of social contact calls relative to non-social contact calls in each recording per subject per group (i.e., (x̄ (# social calls / # non-social calls). This is now included in the figure legend and the axis is updated (page 32).

      Also, cries could be considered a 'social contact call' since they are produced by infants to elicit responses from the parents. There is also the hypothesis in the literature that cries transition into phees.

      The reviewer is correct. Cries are often considered a social contact call because they elicit parental feedback. We decided to separate cry-calls from other social contact calls for two reasons. First, in our sample, we found cry behavior to be highly variable across the animals. For example, one control infant cried incessantly whereas another control infant cried less than normal. This extreme variability in animals of the same group masked the features between animals that reliably differentiated between them. Second, cry-calls elicit feedback from parents who are normally within the vicinity of the infant whereas phee calls elicit antiphonal phee calls from any distantly located conspecific. In other words, the context in which these calls are often elicited are very different.

      The use of 'syntactical' is a bit jarring to me because outside of linguistics, its use in animal communication generally refers to meaning-bearing units that can be combined into well-formed complexes such as pod-specific whale songs or predator alarm calls with concatenated syllable types in some species of monkeys. To my knowledge, individual phee syllables have not been currently shown to carry information on their own and may be better described as 'sequential' rather than 'syntactical'.

      We agree. We have made this change accordingly.

      Figure 4B) How many phee calls with differing numbers of syllables are present each week? How equal is the distribution given that later analyses go up to 5 syllables?

      The total number of phee calls with differing number of syllables ranged between 20-40 phees. This number varied between subjects, per week. The most common were 3- and 4-syllable phee calls which ranged from 7-15. Due to this variability, Fig. 4B presents the average syllable count. The axis is now updated.

      Figure 4C-E) How is the data combined here? Is there a 2nd syllable, the combined data from the 2nd syllable from phee calls of all lengths (1 - 5?). If so, are there differences based on how long the total sequence is?

      The combined data represents the specific syllable (e.g., the 1st syllable in a 2-syllable phee, in a 3-syllable phee and in a 4-syllable phee) irrespective of the length of the sequence in a sequence. No differences were observed between 2nd syllable in a 2 syllable phee and 2nd syllable in a 3 or a 4 syllable phee. We have included this detail in the figure legend (page 33-34).

      So duration is a vocal parameter that is highly dependent on physical factors such as body size and lung volume, where there differences in physical growth between the pairs of ACC-lesioned marmosets and their twins? Entropy is less closely tied to these physical factors but has previously been shown to decrease as phee calls mature, which we can also see in the negative relationship of the control animals. Do you know of experiments that show that lower entropy calls are more 'blunted'?

      Thank you for raising the important issue of physical growth factors. For twin pairs, it is not uncommon for one infant to be slightly bigger, heavier or stronger than the other presumably because one gets more access to food. With increasing age, we did not observe significant changes in bodyweight between the groups. We examined grip strength in all infants as a means of assessing how well the infant was able to access food during nursing. Poor grip strength would indicate a lower propensity to ‘hang on’ to the mother for nursing which could lead to lower weight gain and reduced physical growth. We found that both grip strength and body weight increased as the infants got older and both parameters were equivalent. We have included an additional figure to show the normal increase in both weight and grip strength to the supplemental materials (Fig. S3) and have made reference to this in the text (page 8).

      As for entropy, it’s impact on the emotional quality of vocalizations has not been systematically explored. Generally speaking, high entropy relates to high randomness and distortion in the signal. Accordingly, one view posits low-entropy phee calls represent mature sounding calls relative to noisy and immature high-entropy calls (e.g., Takahasi et al 2017). In the current study, the reduction in syllable entropy observed for both groups of animals with increasing age is consistent with this view. At the same time entropy can relate to vocal complexity; high entropy refers to complex and variable sound patterns whereas low entropy sounds are predictable, less diverse and simple vocal sequences (Kershenbaum, A. 2013. Entropy rate as a measure of animal vocal complexity. Bioacoustics, 23(3), 195–208). One possibility is that call maturity does not equate directly to emotional quality. In other words, a low-entropy mature call can also be lacking in emotion as observed in humans with ACC damage; these patients show mature speech, but they lack the variations in rhythms, patterns and intonation (i.e., prosody) that would normally convey emotional salience and meaning. Our observation of a reduction in phee syllable entropy in the ACC group in the context of being short and loud with reduced peak frequency is consistent with this view. Our use of the word ‘blunt’ was to convey how the calls exhibited by the ACC group were potentially lacking emotional meaning. Beyond this speculation, we are not aware of any papers that have examined the relationship between entropy and blunted calls directly. We have now included this speculation in the discussion (pages 12-13).

      Reviewer #2 (Public Review):

      The authors state that the integrity of white matter tracts at the injection site was impacted but do not show data.

      We have added representative micrographs of a control and ACC-lesioned animal in a new supplementary figure which shows the neurotoxin impacted the integrity of white matter tracts local to the site of the lesion (Fig. S2).

      The study only provides data up to the 6th week after birth. Given the plasticity of the cortex, it would be interesting to see if these impairments in vocal behavior persist throughout adulthood or if the lesioned marmosets will recover their social-vocal behavior compared to the control animals.

      We agree. Our original intention was to examine behavior into adulthood. Unfortunately, the COVID-19 pandemic compromised the continuation of the study. We were limited by the data that we were allowed to acquire due to imposed restrictions. Some non-vocalization data collected when the animals were young adults is currently being prepared for another paper.

      Even though this study focuses entirely on the development of social vocalizations, providing data about altered social non-vocal behaviors that accompany ACC lesions is missing. This data can provide further insights and generate new hypotheses about the exact role of ACC in social vocal development. For example, do these marmosets behave differently towards their conspecifics or family members and vice versa, and is this an alternate cause for the observed changes in social-vocal development?

      We agree. At the time however, apparatus for assessing behavior between the infant’s family and non-family members was not available. Assessing such behaviors in the animals holding room posed some difficulty since marmosets are easily distracted by other animals as well as the presence of an experimenter, amongst other things. This is an area of investigation we are currently pursuing.

      Reviewer #3 (Public Review):

      It is striking to find that the vocal repertoire of infant marmosets was not significantly affected by ACC lesions. During development, the neural circuits are still maturing and the role of different brain regions may evolve over time. While the ACC likely contributes to vocalizations across the lifespan, its relative importance may vary depending on the developmental stage. In neonates, vocalizations may be more reflexive or driven by physiological needs. At this stage, the ACC may play a role in basic socioemotional regulation but may not be as critical for vocal production. Since the animals lived for two years, further analysis might be helpful to elucidate the precise role of ACC in the vocal behavior of marmosets.

      Figure 3D. According to the Introduction "...infant ACC lesions abolish the characteristic cries that infants normally issue when separated from its mother". Are the present results in marmosets showing the opposite effect? Please discuss.

      To date, the work of Maclean (1985) is the only publication that describes the effect of early cingulate ablation on the spontaneous production of ‘separation calls’ largely construed as cries, coos and whimpers in response to maternal separation. All of this work was largely performed in rhesus macaques or squirrel monkeys. In addition to ablating the cingulate cortex, Maclean found that it was necessary to ablate the subcallosal (areas 25) and preseptal cingulate cortex (presumably referring to prelimbic area 32) to permanently eliminate the spontaneous production of separation cry calls. Our ablation of the ACC was more circumscribed to area 24 and is therefore consistent with MacLean’s earlier work that removal of ACC alone does not eliminate cry behavior. In adults, ACC ablation is insufficient at eliminating vocalization as well. We make reference to this on pages 13-14 of the discussion.

      Figure 3E and Discussion. Phees are mature contact calls and cries immature contact calls (Zhang et al, 2019, Nat Commun). Therefore, I would rather say that the proportion of immature (cries) contact calls increases vs the mature (phee, trill, twitters) contact calls in the ACC group. Cries are also "isolated-induced contact calls" to attract the attention of the caregivers.

      The reviewer is correct in that cries are directed towards caregivers but in our sample, cry behavior was highly variable between the infants. Consequently, in Fig. 3E social contact calls include phee, twitter and trill calls but does not include cries which were separated (see also response to reviewer #1). Many of the calls made during babbling were immature in their spectral pattern (compare phee calls between Fig. 3A and 3B). Cries typically transitioned into phees, twitters or trills before they fully matured. Fig 3E shows that the controls made more isolation-induced social contact calls at postnatal week 6 which were presumably maturing at this time point. Thus, if anything, there was an increase in the proportion of mature contact calls vs immature contact calls with increasing age.

      Figure 4D. Animal location and head direction within the recording incubator can have significant effects on the perceived amplitude of a call. Were these factors taken into account?

      The reviewer makes an excellent observation. Unfortunately, we did not account for location and head direction because the infants were quite mobile in the incubator. The directional microphone was hidden from view because the infants were distracted by it, and positioned ~12 cm from the marmoset, and placed in the exact same location for every recording. In addition, calls with phantom frequencies were eliminated during visual inspection of spectrograms. Beyond these details, location and head direction were not taken into account.

      Figure 4E. When a phee call has a higher amplitude, as is the case for the ACC group (Figure 4D), the energy of the signal will be concentrated more strongly at the phee call frequency ~8KHz. This concentration of the energy reduces the variability in the frequency distribution, leading to lower entropy. The interpretation of the results should be reconsidered. A faint call (control group) can exhibit more variability in the frequency content since the energy is distributed across a wider range of frequencies contributing to higher entropy. It can still be "fixed, regular, and stereotyped" if the behavior is consistent or predictable with little variation. Also, to define ACC calls as "monotonic" I would rather search for the lack of frequency modulation, amplitude variation, or narrower bandwidth.

      We very much appreciate this explanation. We were able to identify the maximum frequency that closely matched pitch of a sound for each syllable in a multisyllabic phee. New Fig. 4E shows that the peak frequency for each phee syllable was lower in the ACC-lesioned monkeys which may directly translate to the low entropy observed in this group. The term “monotonic” was used to relate our data to the classical and long-standing evidence of human ACC lesions causing monotonous intonation of speech. When all factors are taken into account, it is evident that the vocal phee signature of the ACC-lesioned animal was structurally different to the controls implicating a less complex and stereotyped ACC signal. Further studies are needed to systematically explore the relationship between entropy and emotional quality of vocalizations

      Apart from the changes in the vocal behavior, did the AAC lesions manifest in any other observable cognitive, emotional, or social behavior? ACC plays a role in processing pain and modulating pain perception. Could that be the reason for the observed increase in the proportion of cries in the ACC group and the increase in the phee call amplitude? Did the cries in the ACC group also display a higher amplitude than the cries in the control group?

      It was our intention to acquire as much data as possible from these infants as they matured from a cognitive, social and emotional perspective. Unfortunately, our study was hampered by variety of reasons including the COVID-19 pandemic which imposed major restrictions on our ability to continue with the experiment in a time sensitive manner. In addition, the development and construction of the custom apparatus to measure these behaviors was stalled during this period further preventing us from collecting behavioral data at regular time intervals. As for the cry behavior, the number of cries, in the ACC group were very low especially at postnatal week 5 and 6. Consequently, there were very few data points to work with.

      Discussion. Louder calls have the potential to travel longer distances compared to fainter calls, possess higher energy levels, and can propagate through the environment more effectively. If the ACC group produced louder phee syllables, how could be the message conveyed over long distances "deficient, limited, and/or indiscriminate"?

      Thanks for raising this interesting concept. Not all calls emitted by the animals were loud. We specifically examined the long-distance phee call in this regard. The phee syllables emitted by the ACC group were high amplitude with low frequencies, short duration and low entropy. Taking these factors into account, it is conceivable that the phee calls produced by the ACC group could not effectively convey their message over long distances despite their propagation through the environment. We have made reference to this in the discussion where we focus is specifically on the phee calls only (pages 12).

      Abstract: Do marmosets have syntax? Consider replacing "syntactical" with a more appropriate term (maybe "syntax-like").

      Thanks for this suggestion. We have replaced the term syntactical with ‘sequential’ as per the recommendation of reviewer #1.

      Introduction: "...cries that infants normally issue when separated from its mother". Please replace "its" with "their".

      This has been corrected.

      Results: Is the reference to Fig 1B related to the text?

      We have included and referred to Fig. 1B in the text (results and methods) to show other researchers how they can use this technique as a reliable and safe means of monitoring tidal volume under anesthesia in small infant marmoset without intubation.

      I understand that both "spectrograph" and "spectrogram" are used to analyze the frequency content of a signal. Nevertheless, "spectrogram" refers to the visual representation of the frequency content of a signal over time, and this term is commonly used in audio signal processing and specifically in the vocal communication field. I would recommend replacing "spectrograph" with "spectrogram".

      Thanks for this suggestion. We have corrected this throughout the manuscript.

      (Concerning the previous comment in the public review). Cries are uttered to attract the attention of the caregivers. The increase in the proportion of cries in the ACC group does not match the sentence: "...these infants appeared to make little effort in using vocalizations to solicit social contact when socially isolated".

      We apologize for the confusion. It is not the case that the ACC animals make more cries. Cry calls were highly variable amongst the animals. Consequently, although Fig 3D gives the impression that the proportion of cries in higher in ACC animals they did not differ significantly from the controls. Due to their high variability, cries were removed in the measurement of social contact. Accordingly, Fig. 3E does not include cry behavior; it shows that the ACC animals engage less in social contact calls.

      Related to Figure 3. What is the difference between "egg" and "eck" calls? Do you mean "ock"?

      We apologize. This was a typo. It should be ock calls.

      Figure 4B. Is the sample size five animals per group and per week? Overlapping data points seem to be placed next to each other. Why in some groups (e.g. ACC 6 weeks) less than five dots are visible?

      The sample size differed per week because of the lack of recording during the COVID restrictions. In Fig 4b, we have now separated the overlapping dots. We have also added the sample size of the groups in the figure legends.

      Would the authors expect to see stronger differences between the lesioned and the control groups when comparing a later developmental stage? The animals were euthanized at the age of

      These speculation is certainly feasible and yes, we were hoping to establish this level of detail with testing at later developmental stages. This is an aspect of development we are currently pursuing.

      Could these experiments be conducted?

      I’m afraid these animals are longer available, but we are currently conducting experiments in other animals with early life neurochemical manipulations who show behavioral changes into early adulthood.

      ACC lesion: It is reported that the lesions extended past 24b into motor area 6M. Did the animal display any motor control disability?

      Surprisingly, despite the lesion encroaching into 6M, these animals showed no observable motor impairment. We assessed the animals grip strength and body weight and discovered normal strength and growth in weight in both controls and the lesioned group. We have added this data as supplemental information (Fig. S3).

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      This study investigates what happens to the stimulus-driven responses of V4 neurons when an item is held in working memory. Monkeys are trained to perform memory-guided saccades: they must remember the location of a visual cue and then, after a delay, make an eye movement to the remembered location. In addition, a background stimulus (a grating) is presented that varies in contrast and orientation across trials. This stimulus serves to probe the V4 responses, is present throughout the trial, and is task-irrelevant. Using this design, the authors report memory-driven changes in the LFP power spectrum, changes in synchronization between the V4 spikes and the ongoing LFP, and no significant changes in firing rate.

      Strengths:

      (1) The logic of the experiment is nicely laid out.

      (2) The presentation is clear and concise.

      (3) The analyses are thorough, careful, and yield unambiguous results.

      (4) Together, the recording and inactivation data demonstrate quite convincingly that the signal stored in FEF is communicated to V4 and that, under the current experimental conditions, the impact from FEF manifests as variations in the timing of the stimulus-evoked V4 spikes and not in the intensity of the evoked activity (i.e., firing rate).

      Weaknesses:

      I think there are two limitations of the study that are important for evaluating the potential functional implications of the data. If these were acknowledged and discussed, it would be easier to situate these results in the broader context of the topic, and their importance would be conveyed more fairly and transparently.

      (1) While it may be true that no firing rate modulations were observed in this case, this may have been because the probe stimuli in the task were behaviorally irrelevant; if anything, they might have served as distracters to the monkey's actual task (the MGS). From this perspective, the lack of rate modulation could simply mean that the monkeys were successful in attending the relevant cue and shielding their performance from the potentially distracting effect of the background gratings. Had the visual probes been in some way behaviorally relevant and/or spatially localized (instead of full field), the data might have looked very different.

      Any task design involves tradeoffs; if the visual stimulus was behaviorally relevant, then any observed neurophysiological changes would be more confounded by possible attentional effects. We cannot exclude the possibility that a different task or different stimuli would produce different results; we ourselves have reported firing rate enhancements for other types of visual probes during an MGS task (Merrikhi et al. 2017). We have added an acknowledgement of these limitations in the discussion section (lines 323-330 in untracked version). At minimum, our results show a dissociation between the top-down modulation of phase coding, which is enhanced during WM even for these task-irrelevant stimuli, and rate coding. Establishing whether and how this phase coding is related to perception and behavior will be an important direction for future work.

      With this in mind, it would be prudent to dial down the tone of the conclusions, which stretch well beyond the current experimental conditions (see recommendations).

      We have edited the title (removing the word ‘primarily’) and key sentences throughout to tone down the conclusions, generally to state that the importance of a phase code in WM modulations is *possible* given the observed results, rather than certain (see abstract lines 26-27, introduction lines 59-62, conclusion lines 310-311).

      (2) Another point worth discussing is that although the FEF delay-period activity corresponds to a remembered location, it can also be interpreted as an attended location, or as a motor plan for the upcoming eye movement. These are overlapping constructs that are difficult to disentangle, but it would be important to mention them given prior studies of attentional or saccade-related modulation in V4. The firing rate modulations reported in some of those cases provide a stark contrast with the findings here, and I again suspect that the differences may be due at least in part to the differing experimental conditions, rather than a drastically different encoding mode or functional linkage between FEF and V4.

      We have added a paragraph to the discussion section addressing links to attention and motor planning (lines 315-333), and specifically acknowledging the inherent difficulties of fully dissociating these effects when interpreting our results (lines 323-330).

      Reviewer #2 (Public review):

      Summary:

      It is generally believed that higher-order areas in the prefrontal cortex guide selection during working memory and attention through signals that selectively recruit neuronal populations in sensory areas that encode the relevant feature. In this work, Parto-Dezfouli and colleagues tested how these prefrontal signals influence activity in visual area V4 using a spatial working memory task. They recorded neuronal activity from visual area V4 and found that information about visual features at the behaviorally relevant part of space during the memory period is carried in a spatially selective manner in the timing of spikes relative to a beta oscillation (phase coding) rather than in the average firing rate (rate code). The authors further tested whether there is a causal link between prefrontal input and the phase encoding of visual information during the memory period. They found that indeed inactivation of the frontal eye fields, a prefrontal area known to send spatial signals to V4, decreased beta oscillatory activity in V4 and information about the visual features. The authors went one step further to develop a neural model that replicated the experimental findings and suggested that changes in the average firing rate of individual neurons might be a result of small changes in the exact beta oscillation frequency within V4. These data provide important new insights into the possible mechanisms through which top-down signals can influence activity in hierarchically lower sensory areas and can therefore have a significant impact on the Systems, Cognitive, and Computational Neuroscience fields.

      Strengths:

      This is a well-written paper with a well-thought-out experimental design. The authors used a smart variation of the memory-guided saccade task to assess how information about the visual features of stimuli is encoded during the memory period. By using a grating of various contrasts and orientations as the background the authors ensured that bottom-up visual input would drive responses in visual area V4 in the delay period, something that is not commonly done in experimental settings in the same task. Moreover, one of the major strengths of the study is the use of different approaches including analysis of electrophysiological data using advanced computational methods of analysis, manipulation of activity through inactivation of the prefrontal cortex to establish causality of top-down signals on local activity signatures (beta oscillations, spike locking and information carried) as well as computational neuronal modeling. This has helped extend an observation into a possible mechanism well supported by the results.

      Weaknesses:

      Although the authors provide support for their conclusions from different approaches, I found that the selection of some of the analyses and statistical assessments made it harder for the reader to follow the comparison between a rate code and a phase code. Specifically, the authors wish to assess whether stimulus information is carried selectively for the relevant position through a firing rate or a phase code. Results for the rate code are shown in Figures 1B-G and for the phase code are shown in Figure 2. Whereas an F-statistic is shown over time in Figure 1F (and Figure S1) no such analysis is shown for LFP power. Similarly, following FEF inactivation there is no data on how that influences V4 firing rates and information carried by firing rates in the two conditions (for positions inside and outside the V4 RF). In the same vein, no data are shown on how the inactivation affects beta phase coding in the OUT condition.

      Per the reviewer’s suggestion, we have added several new supplementary figures. We now show the F-statistic for discriminability over time for the LFP timecourse (Fig. S2), and as a function of power in various frequencies (Fig. S4). We have added before/after inactivation comparisons of the LFP and spiking activity, and their respective F-statistics for discrimination between contrasts and orientations in Fig. S9. Lastly, we added a supplementary figure evaluating the impact of FEF inactivation on beta phase coding in the OUT condition, showing no significant change (Fig. S11).

      Moreover, some of the statistical assessments could be carried out differently including all conditions to provide more insight into mechanisms. For example, a two-way ANOVA followed by post hoc tests could be employed to include comparisons across both spatial (IN, OUT) and visual feature conditions (see results in Figures 2D, S4, etc.). Figure 2D suggests that the absence of selectivity in the OUT condition (no significant difference between high and low contrast stimuli) is mainly due to an increase in slope in the OUT condition for the low contrast stimulus compared to that for the same stimulus in the IN condition. If this turns out to be true it would provide important information that the authors should address.

      We have updated the STA slope measurement, excluding the low contrast condition which lacks a clear peak in the STA. Additionally, we equalized the bin widths and aligned the x-axes for better visual comparability. Then, we performed a two-way ANOVA, analyzing the effects of spatial features (IN vs. OUT) and visual conditions (contrast and orientation). The results showed a significant effect of the visual feature on both orientation (F = 3.96, p=0.046) and contrast (F = 14.26, p<10<sup>-3</sup>). However, neither the spatial feature nor the spatial-visual interaction exhibited significant effects for orientation (F = 0.52, p=0.473, F=1.56, p=0.212) or contrast (F = 2.19, p=0.139, F=1.15, p=0.283).

      There are also a few conceptual gaps that leave the reader wondering whether the results and conclusion are general enough. Specifically,

      (1) The authors used microstimulation in the FEF to determine RFs. It is thus possible that the FEF sites that were inactivated were largely more motor-related. Given that beta oscillations and motor preparatory activity have been found to be correlated and motor sites show increased beta oscillatory activity in the delay period, it is possible that the effect of FEF inactivation on V4 beta oscillations is due to inactivation of the main source of beta activity. Had the authors inactivated sites with a preponderance of visual neurons in the FEF would the results be different?

      We do not believe this to be likely based on what is known anatomically and functionally about this circuitry. Anatomically, the projections from FEF to V4 arise primarily from the supragranular layers, not layers which contain the highest proportion of motor activity (Barone et al. 2000, Pouget et al. 2009, Markov et al. 2013). Functionally, based on electrical identification of V4-projecting FEF neurons, we know that FEF to V4 projections are predominantly characterized by delay rather than motor activity (Merrikhi et al. 2017). We have now tried to emphasize these points when we introduce the inactivation experiments (lines 185-186).

      Experimentally, the spread of the pharmacological effect with our infusion system is quite large relative to any clustering of visual vs. motor neurons within the FEF, with behavioral consequences of inactivation spreading to cover a substantial portion of the visual hemifield (e.g., Noudoost et al. 2014, Clark et al. 2014), and so our manipulation lacks the spatial resolution to selectively target motor vs. other FEF neurons.

      (2) Somewhat related to this point and given the prominence of low-frequency activity in deeper layers of the visual cortex according to some previous studies, it is not clear where the authors' V4 recordings were located. The authors report that they do have data from linear arrays, so it should be possible to address this.

      Unfortunately, our chamber placement for V4 has produced linear array penetration angles which do not reliably allow identification of cortical layers. We are aware of previous results showing layer-specific effects of attention in V4 (e.g., Pettine et al. 2019, Buffalo et al. 2011), and it would indeed be interesting to determine whether our observed WM-driven changes follow similar patterns. We may be able to analyze a subset of the data with current source density analysis to look for layer-specific effects in the future, but are not able to provide any information at this time.

      (3) The authors suggest that a change in the exact frequency of oscillation underlies the increase in firing rate for different stimulus features. However, the shift in frequency is prominent for contrast but not for orientation, something that raises questions about the general applicability of this observation for different visual features.

      While the shift in peak frequency across contrasts is more prominent than that across orientations (Fig. S3A-B), the relationship between orientation and peak frequency is also significant (one-way ANOVA for peak frequency across contrasts, F<sub>Contrast</sub>=10.72, p<10<sup>-4</sup>; or across orientations, F<sub>Orientation</sub>=3, p=0.030; stats have been added to Fig. S3 caption). This finding also aligns with previous studies, which reported slight peak frequency shifts (~1–2 Hz) in the context of attention (Fries, 2015). To address the question of whether the frequency-firing rate correlation generalizes to orientation-driven changes, we now examine the relationship between peak frequency and firing rate separately for each contrast level (Fig. S14). The average normalized response as a function of peak frequency, pooled across subsamples of trials from each of 145 V4 neurons (100 subsamples/neuron), IN vs. OUT conditions, shows a significant correlation during the delay period for each contrast (contrast low (F<sub>Condition</sub>=0.03, p=0.867; F<sub>Frequency</sub>=141.86, p<10<sup>-18</sup>; F<sub>Interaction</sub>=10.70, p=0.002, ANCOVA), contrast middle (F<sub>Condition</sub>=7.18, p=0.009; F<sub>Frequency</sub>=96.76, p<10<sup>-14</sup>; F<sub>Interaction</sub>=0.13, p=0.716, ANCOVA), contrast high (F<sub>Condition</sub>=12.51, p=0.001; F<sub>Frequency</sub>=333.74, p<10<sup>-29</sup>; F<sub>Interaction</sub>=7.91, p=0.006, ANCOVA).

      (4) One of the major points of the study is the primacy of the phase code over the rate code during the delay period. Specifically, here it is shown that information about the visual features of a stimulus carried by the rate code is similar for relevant and irrelevant locations during the delay period. This contrasts with what several studies have shown for attention in which case information carried in firing rates about stimuli in the attended location is enhanced relative to that for stimuli in the unattended location. If we are to understand how top-down signals work in cognitive functions it is inevitable to compare working memory with attention. The possible source of this difference is not clear and is not discussed. The reader is left wondering whether perhaps a different measure or analysis (e.g. a percent explained variance analysis) might reveal differences during the delay period for different visual features across the two spatial conditions.

      We have added discussion regarding the relationship of these results to previous findings during attention in the discussion section (lines 315-333).

      The use of the memory-guided saccade task has certain disadvantages in the context of this study. Although delay activity is interpreted as memory activity by the authors, it is in principle possible that it reflects preparation for the upcoming saccade, spatial attention (particularly since there is a stimulus in the RF), etc. This could potentially change the conclusion and perspective.

      We have added a new discussion paragraph addressing the relationship to attention and motor planning (lines 315-333). We have also moderated the language used to describe our conclusions throughout the manuscript in light of this ambiguity.

      For the position outside the V4 RF, there is a decrease in both beta oscillations and the clustering of spikes at a specific phase. It is therefore possible that the decrease in information about the stimuli features is a byproduct of the decrease in beta power and phase locking. Decreased oscillatory activity and phase locking can result in less reliable estimates of phase, which could decrease the mutual information estimates.

      Looking at the SNR as a ratio of power in the beta band to all other bands, there is no significant drop in SNR between conditions (SNRIN = 4.074+-984, SNROUT = 4.333+-0.834 OUT, p=0.341, Wilcoxon signed-rank). Therefore, we do not think that the change in phase coding is merely a result of less reliable phase estimates.

      The authors propose that coherent oscillations could be the mechanism through which the prefrontal cortex influences beta activity in V4. I assume they mean coherent oscillations between the prefrontal cortex and V4. Given that they do have simultaneous recordings from the two areas they could test this hypothesis on their own data, however, they do not provide any results on that.

      This paper only includes inactivation data. We are working on analyzing the simultaneous recording data for a future publication.

      The authors make a strong point about the relevance of changes in the oscillation frequency and how this may result in an increase in firing rate although it could also be the reverse - an increase in firing rate leading to an increase in the frequency peak. It is not clear at all how these changes in frequency could come about. A more nuanced discussion based on both experimental and modeling data is necessary to appreciate the source and role (if any) of this observation.

      As the reviewer notes, it is difficult to determine whether the frequency changes drive the rate changes, vice versa, or whether both are generated in parallel by a common source. We have adjusted our language to reflect this (lines 291-293). Future modeling work may be able to shed more light on the causal relationships between various neural signatures.

      Reviewer #3 (Public review):

      Summary:

      In this report, the authors test the necessity of prefrontal cortex (specifically, FEF) activity in driving changes in oscillatory power, spike rate, and spike timing of extrastriate visual cortex neurons during a visual-spatial working memory (WM) task. The authors recorded LFP and spikes in V4 while macaques remembered a single spatial location over a delay period during which task-irrelevant background gratings were displayed on the screen with varying orientation and contrast. V4 oscillations (in the beta range) scaled with WM maintenance, and the information encoded by spike timing relative to beta band LFP about the task-irrelevant background orientation depended on remembered location. They also compared recorded signals in V4 with and without muscimol inactivation of FEF, demonstrating the importance of FEF input for WM-induced changes in oscillatory amplitude, phase coding, and information encoded about background orientations. Finally, they built a network model that can account for some of these results. Together, these results show that FEF provides meaningful input to the visual cortex that is used to alter neural activity and that these signals can impact information coding of task-irrelevant information during a WM delay.

      Strengths:

      (1) Elegant and robust experiment that allows for clear tests for the necessity of FEF activity in WM-induced changes in V4 activity.

      (2) Comprehensive and broad analyses of interactions between LFP and spike timing provide compelling evidence for FEF-modulated phase coding of task-irrelevant stimuli at remembered location.

      (3) Convincing modeling efforts.

      Weaknesses:

      (1) 0% contrast background data (standard memory-guided saccade task) are not reported in the manuscript. While these data cannot be used to consider information content of spike rate/time about task-irrelevant background stimuli, this condition is still informative as a 'baseline' (and a more typical example of a WM task).

      We have added a new supplementary figure to show the effect of WM on V4 LFP power and SPL in 0% contrast trials (Fig. S6). These results (increases in beta LFP power and SPL when remembering the V4 RF location) match our previous report for the effect of spatial WM on LFP power and SPL within extrastriate area MT (Bahmani et al. 2018).

      (2) Throughout the manuscript, the primary measurements of neural coding pertain to task-irrelevant stimuli (the orientation/contrast of the background, which is unrelated to the animal's task to remember a spatial location). The remembered location impacts the coding of these stimulus variables, but it's unclear how this relates to WM representations themselves.

      Indeed, here we have focused on how maintaining spatial WM impacts visual processing of incoming sensory information, rather than on how the spatial WM signal itself is represented and maintained. Behaviorally, this impact on visual signals could be related to the effects of the content of WM on perception and reaction times (e.g., Soto et al. 2008, Awh et al. 1998, Teng et al. 2019), but no such link to behavior is shown in our data.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      As mentioned above, the two points I raised in the public review merit a bit of development in the Discussion. In addition, the authors should revise some of their conclusions.

      For instance (L217):

      "The finding that WM mainly modulates phase coded information within extrastriate areas fundamentally shifts our understanding of how the top-down influence of prefrontal cortex shapes the neural representation, suggesting that inducing oscillations is the main way WM recruits sensory areas."

      In my opinion, this one is over-the-top on various counts.

      Here is another exaggerated instance (L298):

      "...leading us to conclude that representations based on the average firing rate of neurons are not the primary way that top-down signals enhance sensory processing."

      Again, as noted above, the problem is that one could make the case that the top-down signals are, in fact, highly effective, since they are completely quashing any distracter-related modulation in firing rate across RFs. There is only so much that one can conclude from responses to stimuli that are task-irrelevant, uniform across space, and constant over the course of a trial.

      I think even the title goes too far. What the work shows, by all accounts, is that the sustained activity in FEF has a definitive impact on V4 *even* with respect to a sustained, irrelevant background stimulus. The result is very robust in this sense. However, this is quite different from saying that the *primary* means of functional control for FEF is via phase coding. Establishing that would require ruling out other forms of control (i.e., rate coding) in all or a wide range of experimental conditions. That is far from the restricted set of conditions tested here and is also at variance with many other experiments demonstrating effects of attention or even FEF microstimulation on V4 firing activity.

      To reiterate, in my opinion, the work is carefully executed and the data are interesting and largely unambiguous. I simply take issue with what can be reliably concluded, and how the results fit with the rest of the literature. Revisions along these lines would improve the readability of the paper considerably.

      We have edited the title (removing the word ‘primarily’) and key sentences throughout to tone down the conclusions, generally to state that the importance of a phase code in WM modulations is *possible* given the observed results, rather than certain (see abstract lines 26-27, introduction lines 59-62, conclusion lines 310-311).

      Reviewer #3 (Recommendations for the authors):

      (1) My primary comment that came up multiple times as I read the manuscript (and which is summarized above) is that I wasn't ever sure why the authors are focused on analyzing neural coding of task-irrelevant sensory information during a WM task as a function of WM contents (remembered location). Most studies of neural codes supporting WM often focus on coding the remembered information - not other information. Conceptually, it seems that the brain would want to suppress - or at least not enhance - representations of task-irrelevant information when performing a demanding task, especially when there is no search requirement, and when there is no feature correspondence between the remembered and viewed stimuli. (i.e., the interaction between WM and visual input is more obvious for visual search for a remembered target). Why, in theory, would a visual region need to improve its coding of non-remembered information as a function of WM? This isn't meant to detract from the results, which are indeed very interesting and I think quite informative. The authors are correct that this is certainly relevant for sensory recruitment models of WM - there's clear evidence for a role of feedback from PFC to extrastriate cortex - but what role, specifically, each region plays in this task is critical to describe clearly, especially given the task-irrelevance of the input. Put another way: what if the animal was remembering an oriented grating? In that case, MI between spike-based measures and orientation would be directly relevant to questions of neural WM representations, as the remembered feature is itself being modeled. But here, the focus seems to be on incidental coding.

      Indeed, here we have focused on how maintaining spatial WM impacts visual processing of incoming sensory information, rather than on how the spatial WM signal itself is represented and maintained. Behaviorally, this impact on visual signals could be related to the effects of the content of WM on perception and reaction times (e.g., Soto et al. 2008, Awh et al. 1998, Teng et al. 2019), but no such link to behavior is shown in our data.

      Whether similar phase coding is also used to represent the content of object WM (for example, if the animal was remembering an oriented grating), or whether phase coding is only observed for WM’s modulation of the representation of incoming sensory signals, is an important question to be addressed in future work.

      (2) Related to the above, the phrasing of the second sentence of the Discussion (lines 291-292) is ambiguous - do the authors mean that the FEF sends signals that carry WM content to V4, or that FEF sends projections to V4, and V4 has the WM content? As presently phrased, either of these are reasonable interpretations, yet they're directly opposing one another (the next sentence clarifies, but I imagine the authors want to minimize any confusion).

      We have edited this sentence to read, “Within prefrontal areas, FEF sends direct projections to extrastriate visual areas, and activity in these projections reflects the content of WM.”

      (3) I'm curious about how the authors consider the spatial WM task here different from a cued spatial attention task. Indeed, both require sustained use of a location for further task performance. The section of the Discussion addressing similar results with attention (lines 307-311) presently just summarizes the similarities of results but doesn't offer a theoretical perspective for how/why these different types of tasks would be expected to show similar neural mechanisms.

      We have added discussion regarding the relationship of these results to previous findings during attention in the discussion section (lines 315-333).

      (4) As far as I can tell, there is no consideration of behavioral performance on the memory-guided saccade task (RT, precision) across the different stimulus background conditions. This should be reported for completeness, and to determine whether there is an impact of the (likely) task-irrelevant background on task performance. This analysis should also be reported for Figure 3's results characterizing how FEF inactivation disrupts behavior (if background conditions were varied, see point 7 below).

      We have added the effect of inactivation on behavioral RT and % correct across the different stimulus background conditions (Fig. S8). Background contrast and orientation did not impact either RT or % correct.

      (5) Results from Figure 2 (especially Figures 2A-B) concerning phase-locked spiking in V4 should be shown for 0%-contrast trials as well, as these trials better align with 'typical' WM tasks.

      We have added a new supplementary figure to show the effect of WM on V4 LFP power and SPL in 0% contrast trials (Fig. S6). These results (increases in beta LFP power and SPL) match our previous report for the effect of spatial WM on LFP power and SPL within extrastriate area MT (Bahmani et al. 2018).

      (6) The magnitude of SPL difference in aggregate (Figure 2B) is much, much smaller than that of the example site shown (Figure 2A), such that Figure 2A's neuron doesn't appear to be visible on Figure 2B's scatterplot. Perhaps a more representative sample could be shown? Or, the full range of x/y axes in Figure 2B could be plotted to illustrate the full distribution.

      We have updated Fig. 2A with a more representative sample neuron.

      (7) I'm a bit confused about the FEF inactivation experiments. In the Methods (lines 512-513), the authors mention there was no background stimulus presented during the inactivation experiment, and instead, a typical 8-location MGS task was employed. However, in the results on pg 8 (Lines 201-214), and Figure 3G, the authors quantify a phase code MI. The previous phase code MI analysis was looking at MI between each spike's phase and the background stimulus - but if there's no background, what's used to compute phase code MI? Perhaps what they meant to write was that, in addition to the primary task with a manipulation of background properties, an 8-location MGS task was additionally employed.

      The reviewer is correct that both tasks were used after inactivation (the 8-location task to assess the spread of the behavioral effect of inactivation, and the MGS-background task for measuring MI). We have edited the methods text to clarify.

      (8) How is % Correct defined for the MGS task? (what is the error threshold? Especially for the results described in lines 192-193).

      The % correct is defined as correct completed trials divided by the total number of trials; the target window was a circle with radius of 2 or 4 dva (depending on cue eccentricity). These details have been added to the Methods.

      (9) The paragraph from lines 183-200 describes a number of behavioral results concerning "scatter" and "RT" - the RT shown seems extremely high, and perhaps is normalized. Details of this normalization should be included in the Methods. The "scatter" is listed as dva, but it's not clear how scatter is quantified (std dev of endpoint distribution? Mean absolute error), nor how target eccentricity is incorporated (as scatter is likely higher for greater target eccentricity).

      We have renamed ‘scatter’ to ‘saccade error’ in the text to match the figure, and now provide details in the Methods section. Both RT and saccade error are normalized for each session, details are now provided in the Methods. Since error was normalized for each session before performing population statistics, no other adjustment for eccentricity was made.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      Summary:

      The authors propose a new model of biologically realistic reinforcement learning in the direct and indirect pathway spiny projection neurons in the striatum. These pathways are widely considered to provide a neural substrate for reinforcement learning in the brain. However, we do not yet have a full understanding of mechanistic learning rules that would allow successful reinforcement learning like computations in these circuits. The authors outline some key limitations of current models and propose an interesting solution by leveraging learning with efferent inputs of selected actions. They show that the model simulations are able to recapitulate experimental findings about the activity profile in these populations of mice during spontaneous behavior. They also show how their model is able to implement off-policy reinforcement learning.

      Strengths:

      The manuscript has been very clearly written and the results have been presented in a readily digestible manner. The limitations of existing models, that motivate the presented work, have been clearly presented and the proposed solution seems very interesting. The novel contribution of the proposed model is the idea that different patterns of activity drive current action selection and learning. Not only does this allow the model is able to implement reinforcement learning computations well, but this suggestion may have interesting implications regarding why some processes selectively affect ongoing behavior and others affect learning. The model is able to recapitulate some interesting experimental findings about various activity characteristics of dSPN and iSPN pathway neuronal populations in spontaneously behaving mice. The authors also show that their proposed model can implement off-policy reinforcement learning algorithms with biologically realistic learning rules. This is interesting since off-policy learning provides some unique computational benefits and it is very likely that learning in neural circuits may, at least to some extent, implement such computations.

      We thank the reviewer for the positive comments.

      Weaknesses:

      A weakness in this work is that it isn’t clear how a key component in the model - an efferent copy of selected actions - would be accessible to these striatal populations. The authors propose several plausible candidates, but future work may clarify the feasibility of this proposal.

      We agree that the biological substrate of the efference copy remains a key open question. We discuss potential pathways in the Discussion section of our manuscript and hope that future experimental studies clarify the question.

      Reviewer #2:

      Summary:

      The basal ganglia is often understood within a reinforcement learning (RL) framework, where dopamine neurons convey a reward prediction error that modulates cortico-striatal connections onto spiny projection neurons (SPNS) in the striatum. However, current models of plasticity rules are inconsistent with learning in a reinforcement learning framework.

      This paper proposes a new model that describes how distinct learning rules in direct and indirect pathway striatal neurons allow them to implement reinforcement learning models. It proposes that two distinct components of striatal activity affect action selection and learning. They show that the proposed implementation allows learning in simple tasks and is consistent with experimental data from calcium imaging data in direct and indirect SPNs in freely moving mice.

      Strengths:

      Despite the success of reward prediction errors at characterizing the responses of dopamine neurons as the temporal difference error within an RL framework, the implementation of RL algorithms in the rest of the basal ganglia has been unclear. A key missing aspect has been the lack of a RL implementation that is consistent with the distinction of direct- and indirect SPNs. This paper proposes a new model that is able to learn successfully in simple RL tasks and explains recent experimental results.

      The author shows that their proposed model, unlike previous implementations, this model can perform well in RL tasks. The new model allows them to make experimental predictions. They test some of these predictions and show that the dynamics of dSPNs and iSPNs correspond to model predictions.

      More generally, this new model can be used to understand striatal dynamics across direct and indirect SPNs in future experiments.

      We thank the reviewer for the positive comments.

      Weaknesses:

      The authors could characterize better the reliability of their experimental predictions and the description of the parameters of some of the simulations.

      In addition to the descriptions in the Methods, we have provided code implementing the key features of our simulations, which should contribute to reproducibility of our results.

      The authors propose some ideas about how the specificity of the striatal efferent inputs but should highlight better that this is a key feature of the model whose anatomical implementation has yet to be resolved.

      We have clarified in the Discussion section “Biological substrates of striatal efferent inputs” that these represent assumptions or predictions that have not yet been demonstrated experimentally.

      Reviewer #3:

      Summary:

      This paper points out an inconsistency of the roles of the striatal spiny neurons projecting to the indirect pathway (iSPN) and the synaptic plasticity rule of those neurons expressing dopamine D2 receptors and proposes a novel, intriguing mechanisms that iSPNs are activated by the efference copy of the chosen action that they are supposed to inhibit.

      The proposed model was supported by simulations and analysis of the neural recording data during spontaneous behaviors.

      Strengths:

      Previous models suggested that the striatal neurons learn action-value functions, but how the information about the chosen action is fed back to the striatum for learning was not clear. The author pointed out that this is a fundamental problem for iSPNs that are supposed to inhibit specific actions and its synaptic inputs are potentiated with dopamine dips.

      The authors propose a novel hypothesis that iSPNs are activated by efference copy of the selected action which they are supposed to inhibit during action selection. Even though intriguing and seemingly unnatural, the authors demonstrated that the model based on the hypothesis can circumvent the problem of iSPNs learning to disinhibit the actions associated with negative reward errors. They further showed by analyzing the cell-type specific neural recording data by Markowitz et al. (2018) that iSPN activities tend to be anti-correlated before and after action selection.

      We thank the reviewer for the positive comments.

      Weaknesses:

      It is not correct to call the action value learning using the externally-selected action as “offpolicy.” Both off-policy algorithm Q-learning and on-policy algorithm SARSA update the action value of the chosen action, which can be different from the greedy action implicated by the present action values. In standard reinforcement learning terminology, on-policy or off-policy is regarding the actions in the subsequent state, whether to use the next action value of (to be) chosen action or that of greedy choice as in equation (7).

      It is worth noting that this paper suggested that dopamine neurons encode on-policy TD errors: Morris G, Nevet A, Arkadir D, Vaadia E, Bergman H (2006). Midbrain dopamine neurons encode decisions for future action. Nat Neurosci, 9, 1057-63. https://doi.org/10.1038/nn1743.

      We regret that we do not completely follow the reviewer’s comment. We use “off-policy” to refer to the fact that, considered in isolation, the basal ganglia reinforcement learning system that we model learns a target policy that may be distinct from the behavioral policy of the organism as a whole.

      It is also confusing to contract TD learning and Q-learning, as the latter is considered as one type of TD learning. In the TD error signal by state value function (6) is dependent on the chosen action at−1 implicitly in rt and st based on the reward and state transition function.

      We agree that this was confusing. We have therefore changed the places in our paper where we intended to refer to “TD learning of a value function V (s)” to specifically mention V (s), rather than just “TD learning.”

      It is not clear why interferences of the activities for action selection and learning can be avoided, especially when actions are taken with short intervals or even temporal overlaps. How can the efference copy activation for the previous action be dissociated with the sensory cued activation for the next action selection?

      The non-interference arises from the orthogonality of the difference (action selection) and sum (efference copy) modes, as described in Figure 3. However, we agree with the reviewer that the problem of temporal credit assignment, when many actions are taken before reward feedback is obtained, is present in our model, as in any standard RL model.

      Although it may be difficult to single out the neural pathway that carries the efference copy signal to the striatum, it is desired to consider their requirements and difference possibilities. A major issue is that the time delay from actions to reward feedback can be highly variable.

      An interesting candidate is the long-latency neurons in the CM thalamus projecting to striatal cholinergic interneurons, which are activated following low-reward actions: Minamimoto T, Hori Y, Kimura M (2005). Complementary process to response bias in the centromedian nucleus of the thalamus. Science, 308, 1798-801. https://doi.org/10.1126/science.1109154.

      We are grateful for the interesting suggestion and reference, which we have added to the manuscript. However, we note that the issue of delayed reward feedback may also be partially addressed by using a sufficiently long eligibility trace.

      In the paragraph before Eq. (3), Eq. (1) should be Eq. (2) for the iSPN.

      Corrected.

    1. Author response:

      eLife Assessment

      This manuscript offers important insights into how polyphosphate (polyP) influences protein phase separation differently from DNA. The authors present compelling evidence that polyP distinguishes between protein conformational states, leading to diverse condensate behaviors. However, differences in charge density between polyP and DNA complicate direct comparisons, and the extent to which polyP-driven phase transitions reveal initial protein states remains unclear. Addressing these concerns would strengthen the manuscript's impact for researchers interested in biomolecular condensates, protein dynamics, and stress response mechanisms.

      We thank the editorial team for the favorable assessment. We, however, contend the specific point on the difference in charge density. We have already performed experiments wherein a higher concentration of DNA is used to match the overall ‘concentration of charges’ as in the experiments with polyP (see Figure S6), and we do not identify or observe any differences in the maturation behavior with DNA, i.e. we see only dissolution at both higher and lower concentrations of DNA. Charge density (i.e. the number of charges per unit volume of the polymer), on the other hand, is an intrinsic feature of the polymer which is naturally different between DNA and polyP. In fact, the primary result of our work is our observation that polyP can discern the starting ensembles more efficiently, likely through actively engaging and interacting with the ensemble while DNA appears to be a passive player. 

      Reviewer #1 (Public review):

      Summary:

      In the article titled "Polyphosphate discriminates protein conformational ensembles more efficiently than DNA promoting diverse assembly and maturation behaviors," Goyal and colleagues investigate the role of negatively charged biopolymers, i.e., polyphosphate (polyP) and DNA, play in phase separation of cytidine repressor (CytR) and fructose repressor (FruR). The authors find that both negative polymers drive the formation of metastable protein/polymer condensates. However, polyPdriven condensates form more gel- or solid-like structures over time while DNA-driven condensates tend to dissipate over time. The authors link this disparate condensate behavior to polyP-induced structures within the enzymes. Specifically, they observe the formation of polyproline II-like structures within two tested enzyme variants in the presence of polyP. Together their results provide a unique insight into the physical and structural mechanism by which two unique negatively charged polymers can induce distinct phase transitions with the same protein. This study will be a welcomed addition to the condensate field and provide new molecular insights into how binding partner-induced structural changes within a given protein can affect the mesoscale behavior of condensates. The concerns outlined below are meant to strengthen the manuscript.

      Strengths:

      Throughout the article, the authors used the correct techniques to probe physical changes within proteins that can be directly linked to phase transition behaviors. Their rigorous experiments create a clear picture of what occurs at the molecular level with CytR and FruR are exposed to either DNA or polyP, which are unique, highly negatively charged biopolymers found within bacteria. This work provides a new view of mechanisms by which bacteria can regulate the cytoplasmic organization upon the induction of stress. Furthermore, this is likely applicable to mammalian and plant cells and likely to numerous proteins that undergo condensation with nucleic acids and other charged biopolymers.

      Weaknesses:

      The biggest weakness of this study is that compares the phase behavior of enzymes driven by negatively charged polymers that have intrinsic differences in net charge and charge density. Because these properties are extremely important for controlling phase separation, any differences may result in the observed phase transitions driven by DNA and polyP. The authors should perform an additional experiment to control for these differences as best they can. The results from these experiments will provide additional insight into the importance of charge-based properties for controlling phase transitions.

      We thank the reviewer for providing a positive review of our work. On the comment related to the final paragraph, we note that we have already conducted an experiment with a higher DNA concentration (11.24 µM) to explore if the concentration of charges plays any significant role. The results of this experiment are presented in Figure S6. We observe that even at a higher DNA concentration, the condensates dissolve over time. Therefore, the difference in the maturation behavior of condensates with varying initial protein ensembles is due to the nature of polyP (likely through its enhanced flexibility). 

      Reviewer #2 (Public review):

      Summary:

      In this study, Goyal et al demonstrate that the assembly of proteins with polyphosphate into either condensates or aggregates can reveal information on the initial protein ensemble. They show that, unlike DNA, polyphosphate is able to effectively discriminate against initial protein ensembles with different conformational heterogeneity, structure, and compactness. The authors further show that the protein native ensemble is vital on whether polyphosphate induces phase separation or aggregation, whereas DNA induces a similar outcome regardless of the initial protein ensemble. This work provides a way to improve our mechanistic understanding of how conformational transitions of proteins may regulate or drive LLPS condensate and aggregate assemblies within biological systems.

      Strengths:

      This is a thoroughly conducted study that provides an alternative route for inducing phase separation that is more informative on the initial protein ensemble involved. This is particularly useful and a complementary means to investigate the role played by protein dynamics and plasticity in phase transitions. The authors use an appropriate set of techniques to investigate unique phase transitions within proteins induced by polyphosphates. An alternative protein system is used to corroborate their findings that the unique assemblies induced by polyphosphates when compared to DNA are not restricted to a single system. The work here is well-documented, easy to interpret, and of relevance for the condensate community.

      Weaknesses:

      The major weakness of this manuscript is that it is unclear if the information on the initial protein conformational ensemble can be determined solely from the assembly and maturation behavior and the discrimination abilities of polyphosphates. In both systems studied (CytR and FruR), polyphosphate discriminates and results in unique assemblies and maturation behaviors based on the initial protein ensemble. However, it seems the assembly and maturation behavior are not a direct result of the degree of conformational dynamics and plasticity in the initial protein. In the case of CytR, the fully-folded system forms condensates that resolubilize, while the highly disordered state immediately aggregates. Whereas, in the case of FruR, the folded state induces spontaneous aggregation, and the more dynamic, molten globular, system results in short-lived condensates. These results seem to suggest the polyphosphates' ability to discriminate between the initial protein ensemble may not be able to reveal what that initial protein ensemble is unless it is already known.

      We thank the reviewer for providing constructive comments on our work. On the final paragraph: we agree that the outcome does not provide information on nature of the starting ensemble. As of now, our experimental results are primarily observations on questions related to maturation outcomes when protein ensembles of varying structure, compactness and stability interact with polyP. if there are differences in the native ensemble due to mutations (which at times cannot be revealed by ensemble probes), polyP appears to discern it more efficiently than DNA.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study aimed to investigate the effects of optically stimulating the A13 region in healthy mice and a unilateral 6-OHDA mouse model of Parkinson's disease (PD). The primary objectives were to assess changes in locomotion, motor behaviors, and the neural connectome. For this, the authors examined the dopaminergic loss induced by 6-OHDA lesioning. They found a significant loss of tyrosine hydroxylase (TH+) neurons in the substantia nigra pars compacta (SNc) while the dopaminergic cells in the A13 region were largely preserved. Then, they optically stimulated the A13 region using a viral vector to deliver the channelrhodopsine (CamKII promoter). In both sham and PD model mice, optogenetic stimulation of the A13 region induced pro-locomotor effects, including increased locomotion, more locomotion bouts, longer durations of locomotion, and higher movement speeds. Additionally, PD model mice exhibited increased ipsi lesional turning during A13 region photoactivation. Lastly, the authors used whole-brain imaging to explore changes in the A13 region's connectome after 6-OHDA lesions. These alterations involved a complex rewiring of neural circuits, impacting both afferent and efferent projections. In summary, this study unveiled the pro-locomotor effects of A13 region photoactivation in both healthy and PD model mice. The study also indicates the preservation of A13 dopaminergic cells and the anatomical changes in neural circuitry following PD-like lesions that represent the anatomical substrate for a parallel motor pathway.

      Strengths:

      These findings hold significant relevance for the field of motor control, providing valuable insights into the organization of the motor system in mammals. Additionally, they offer potential avenues for addressing motor deficits in Parkinson's disease (PD). The study fills a crucial knowledge gap, underscoring its importance, and the results bolster its clinical relevance and overall strength.

      The authors adeptly set the stage for their research by framing the central questions in the introduction, and they provide thoughtful interpretations of the data in the discussion section. The results section, while straightforward, effectively supports the study's primary conclusion - the pro-locomotor effects of A13 region stimulation, both in normal motor control and in the 6-OHDA model of brain damage.

      We thank the reviewer for their positive comments.

      Weaknesses:

      (1) Anatomical investigation. I have a major concern regarding the anatomical investigation of plastic changes in the A13 connectome (Figures 4 and 5). While the methodology employed to assess the connectome is technically advanced and powerful, the results lack mechanistic insight at the cell or circuit level into the pro-locomotor effects of A13 region stimulation in both physiological and pathological conditions. This concern is exacerbated by a textual description of results that doesn't pinpoint precise brain areas or subareas but instead references large brain portions like the cortical plate, making it challenging to discern the implications for A13 stimulation. Lastly, the study is generally well-written with a smooth and straightforward style, but the connectome section presents challenges in readability and comprehension. The presentation of results, particularly the correlation matrices and correlation strength, doesn't facilitate biological understanding. It would be beneficial to explore specific pathways responsible for driving the locomotor effects of A13 stimulation, including examining the strength of connections to well-known locomotor-associated regions like the Pedunculopontine nucleus, Cuneiformis nucleus, LPGi, and others in the diencephalon, midbrain, pons, and medulla.

      We initially considered two approaches. The first was to look at specific projections to the motor regions, focusing on the MLR. The second was to utilize a whole-brain analysis, which is presented here. Given what we know about the zona incerta, especially its integrative role, we felt that examining the full connectome was a reasonable starting point.

      The value of the whole-brain approach is that it provides a high-level overview of the afferents and efferents to the region. The changes in the brain that occur following Parkinson-like lesions, such as those in the nigrostriatal pathway, are complex and can affect neighbouring regions such as the A13. Therefore, we wished to highlight the A13, which we considered a therapeutic target, and examine changes in connectivity that could occur following acute lesions affecting the SNc. We acknowledge that this study does not provide a causal link, but it presents the fundamental background information for subsequent hypothesis-driven, focused, region-specific analysis.

      The terms provided were taken from the Allen Brain Atlas terminology and presented as abbreviations. We have added two new figures focusing on motor regions to make the information more comprehensible (new Figures 4 and 5) and rewrote the connectomics section to make it easier to understand.

      Additionally, identifying the primary inputs to A13 associated with motor function would enhance the study's clarity and relevance.

      This is a great point to help simplify the whole-brain results. We have presented the motor-related inputs and outputs as part of a new figure in the main paper (Figure 5) and added accompanying text in the results section. We have also updated the correlation matrices to concentrate on motor regions (Figure 4). This highlights possible therapeutic pathways. We have also enhanced our discussion of these motor-related pathways. We have retained the entire dataset and added it to our data repository for those interested.

      The study raises intriguing questions about compensatory mechanisms in Parkinson's disease and a new perspective on the preservation of dopaminergic cells in A13, despite the SNc degeneration, and the plastic changes to input/output matrices. To gain inspiration for a more straightforward reanalysis and discussion of the results, I recommend the authors refer to the paper titled "Specific populations of basal ganglia output neurons target distinct brain stem areas while collateralizing throughout the diencephalon from the David Kleinfeld laboratory." This could guide the authors in investigating motor pathways across different brain regions.

      Thank you for the advice. As pointed out, Kleinfeld’s group presented their data in a nice, focused way. For the connectomic piece, we have added Figure 5, which provides a better representation than our previous submission.

      (2) Description of locomotor performance. Figure 3 provides valuable data on the locomotor effects of A13 region photoactivation in both control and 6-OHDA mice. However, a more detailed analysis of the changes in locomotion during stimulation would enhance our understanding of the pro-locomotor effects, especially in the context of 6-OHDA lesions. For example, it would be informative to explore whether the probability of locomotion changes during stimulation in the control and 6-OHDA groups. Investigating reaction time, speed, total distance, and could reveal how A13 is influencing locomotion, particularly after 6-OHDA lesions. The laboratory of Whelan has a deep knowledge of locomotion and the neural circuits driving it so these features may be instructive to infer insights on the neural circuits driving movement. On the same line, examining features like the frequency or power of stimulation related to walking patterns may help elucidate whether A13 is engaging with the Mesencephalic Locomotor Region (MLR) to drive the pro-locomotor effects. These insights would provide a more comprehensive understanding of the mechanisms underlying A13-mediated locomotor changes in both healthy and pathological conditions.

      Thank you for these suggestions. We have reorganized Figure 3 to highlight the metrics by separating the 6-OHDA from the Sham experiments (3F-J, which highlights distance travelled, average speed and duration). We have also added additional text to highlight these metrics better in the text. We have relabelled Supplementary Figure S3, which presents reaction time as latency to initiate locomotion and updated the main text to address the reviewers' points.

      Reviewer #2 (Public Review):

      Summary:

      The paper by Kim et al. investigates the potential of stimulating the dopaminergic A13 region to promote locomotor restoration in a Parkinson's mouse model. Using wild-type mice, 6-OHDA injection depletes dopaminergic neurons in the substantia nigra pars compacta, without impairing those of the A13 region and the ventral tegmentum area, as previously reported in humans. Moreover, photostimulation of presumably excitatory (CAMKIIa) neurons in the vicinity of the A13 region improves bradykinesia and akinetic symptoms after 6-OHDA injection. Whole-brain imaging with retrograde and anterograde tracers reveals that the A13 region undergoes substantial changes in the distribution of its afferents and projections after 6-OHDA injection. The study suggests that if the remodeling of the A13 region connectome does not promote recovery following chronic dopaminergic depletion, photostimulation of the A13 region restores locomotor functions.

      Strengths:

      Photostimulation of presumably excitatory (CAMKIIa) neurons in the vicinity of the A13 region promotes locomotion and locomotor recovery of wild-type mice 1 month after 6-OHDA injection in the medial forebrain bundle, thus identifying a new potential target for restoring motor functions in Parkinson's disease patients.

      Weaknesses:

      Electrical stimulation of the medial Zona Incerta, in which the A13 region is located, has been previously reported to promote locomotion (Grossman et al., 1958). Recent mouse studies have shown that if optogenetic or chemogenetic stimulation of GABAergic neurons of the Zona Incerta promotes and restores locomotor functions after 6-OHDA injection (Chen et al., 2023), stimulation of glutamatergic ZI neurons worsens motor symptoms after 6-OHDA (Lie et al., 2022).

      Thank you - we have added this reference. It is helpful as Grossman did stimulate the zona incerta in the cat and elicit locomotion, suggesting that stimulation of the area in normal mice has external validity. Grossman’s results prompted a later clinical examination of the zona incerta, but it concentrated on the zona incerta regions close to the subthalamic regions (Ossowska 2019), further caudal to the area we focused on. Chen et al. (2023) targeted the area in the lateral aspect of central/medial zona incerta, formed by dorsal and ventral zona incerta, which may account for the differing results. Our data were robust for stimulation of the medial aspect of the rostromedial zona incerta. The thigmotactic behaviour that we observed in our work that focused on CamKII neurons has not been observed with chemogenetic, optogenetic activation or with photoinhibition of GABAergic central/medial ZI (Chen et al. 2023).

      GABAergic activation of mZI to Cuneiform projections (Sharma et al. 2024) also did not produce thigmotactic behavior. We have added these points to the discussion.

      Although CAMKIIa is a marker of presumably excitatory neurons and can be used as an alternative marker of dopaminergic neurons, behavioral results of this study raise questions about the neuronal population targeted in the vicinity of the A13 region. Moreover, if YFP and CHR2-YFP neurons express dopamine (TH) within the A13 region (Fig. 2), there is also a large population of transduced neurons within and outside of the A13 region that do not, thus suggesting the recruitment of other neuronal cell types that could be GABAergic or glutamatergic.

      We found that CamKII transfection of the A13 region was extremely effective in promoting locomotor activity, which was critical for our work in exploring its possible therapeutic potential. We have since quantified the cell number, we found that the c-fos cell number was increased following ChR2 activation. There is evidence of TH activation - but the data suggest that other cell types contribute. C-fos alone is a blunt tool to assess specificity - rather, it is better at showing overall photostimulus efficacy - which we have demonstrated. Moreover, there is evidence that cell types are not purely dopaminergic, with GABA co-localized (Negishi et al. 2020). We acknowledge that specific viral approaches that target the GABAergic, glutamatergic, and dopaminergic circuits would be very useful. The range of tools to target A13 dopaminergic circuits is more limited than the SNc, for example, because the A13 region lacks DAT, and TH-IRES-Cre approaches, while helpful, are less specific than DAT-Cre mouse models. Intersectional approaches targeting multiple transmitters (glutamate & dopamine, for example) may be one solution as we do not expect that a single transmitter-specific pathway would work, as well as broad targeting of the A13 region. Our recent work suggests that GABAergic neuron activation may have more general effects on behaviour rather than control of ongoing locomotor parameters (Sharma et al. 2024). Recent work shows a positive valence effect of dopamine A13 activation on motivated food-seeking behavior, which differs from consummatory behavior observed with GABAergic modulation (Ye, Nunez, and Zhang 2023). Chemogenetic inactivation and ablation of dopaminergic A13 revealed that they contribute to grip strength and prehensile movements, uncoupling food-seeking grasping behavior from motivational factors (Garau et al. 2023). Overall, this suggests differing effects of GABA compared to DA and/or glutamatergic cell types, consistent with our effects of stimulating CamKII. The discussion has been updated.

      Regarding the analysis of interregional connectivity of the A13 region, there is a lack of specificity (the viral approach did not specifically target the A13 region), the number of mice is low for such correlation analyses (2 sham and 3 6-OHDA mice), and there are no statistics comparing 6-OHDA versus sham (Fig. 4) or contra- versus ipsilesional sides (Fig. 5). Moreover, the data are too processed, and the color matrices (Fig. 4) are too packed in the current format to enable proper visualization of the data. The A13 afferents/efferents analysis is based on normalized relative values; absolute values should also be presented to support the claim about their upregulation or downregulation.

      Generally, papers using tissue-clearing imaging approaches have low sample sizes due to technical complexity and challenges. The technical challenges of obtaining these data were substantial in both collection and analysis. There are multiple technical complexities arising from dual injections (A13 and MFB coordinates) and targeting the area correctly. The A13 region is difficult to target as it spans only around 300 µm in the anterior-posterior axis. While clearing the brain takes weeks, and light-sheet imaging also takes time, the time necessary to analyze the tissue using whole-brain quantification is labor intensive, especially with a lack of a standardized analysis pipeline from atlas registrations, signal segmentations, and quantifications. The field is still relatively new, requiring additional time to refine pipelines.

      Correlation matrices are often used in analyzing connectivity patterns on a brain-wide scale, as they can identify any observable patterns within a large amount of data. We used correlation matrices to display estimated correlation coefficients between the afferent and efferent proportions from one brain subregion to another across 251 brain regions in total in a pairwise manner (not for hypothesis testing). We provided descriptive statistics (mean and error bars) in the original Figure 5C and G. As mentioned in comments for Reviewer 1, we have now presented the data in revised Figure 4 and 5 that focuses specifically on motor-related pathways to provide information on possible pathways. The has simplified the correlation matrices and highlighted the differences in 6-OHDA efferent data especially. As suggested, raw values are shared in a supplemental file on our data repository.

      In the absence of changes in the number of dopaminergic A13 neurons after 6-OHDA injection, results from this correlation analysis are difficult to interpret as they might reflect changes from various impaired brain regions independently of the A13 region.

      We acknowledge that models of Parkinson’s disease, particularly those using 6-OHDA, induce plasticity in various regions, which may subsequently affect A13 connectivity. We aim to emphasize the residual, intact A13 pathways that could serve as therapeutic targets in future investigations. This emphasis is pertinent in the context of potential clinical applications, as the overall input and output to the region fundamentally dictate the significance of the A13 region in lesioned nigrostriatal models. We agree with the reviewer that the changes certainly can be independent of A13; however, the fact that there was a significant change in the connectome post-6-OHDA injection and striatonigral degeneration is in and of itself important to document. We have added a sentence acknowledging this limitation to the discussion.

      There is no causal link between anatomical and behavioral data, which raises questions about the relevance of the anatomical data.

      This point was also addressed earlier in response to a comment from Reviewer 1. Focusing on specific motor pathways is one avenue to explore. However, given that the zona incerta acts as an integrative hub, we believed it is prudent to initially examine both afferent and efferent pathways using a brain-wide approach. For instance, without employing this methodology, the potential significance of cortical interconnectivity to the A13 region might not have been fully appreciated. As mentioned previously, we will place additional emphasis on motor-related regions in our revised paper, thereby enhancing the relevance of the anatomical data presented. With these modifications, we anticipate that our data will underscore specific motor-related targets for future exploration, employing optogenetic targeting to assess necessity and sufficiency.

      Overall, the study does not take advantage of genetic tools accessible in the mouse to address the direct or indirect behavioral and anatomical contributions of the A13 region to motor control and recovery after 6-OHDA injection.

      Our study has not specifically targeted neurons that express dopaminergic, glutamatergic, or GABAergic properties (refer to earlier comment for more detail). However, like others, we find that targeting one neuronal population often does not result in a pure transmitter phenotype. For instance, evidence suggests co-localization of dopamine neurons with a subpopulation of GABA neurons in the A13/medial zona incerta (Negishi et al. 2020). In the hypothalamus, research by Deisseroth and colleagues (Romanov et al. 2017) indicates the presence of multiple classes of dopamine cells, each containing different ratios of co-localized peptides and/or fast neurotransmitters. Consequently, we believe our work lays the foundation for the investigations suggested by the reviewer. Furthermore, if one considers this work in the context of a preclinical study to determine whether the A13 might be a target in human Parkinson's disease, the existing technology that could be utilized is deep brain stimulation (DBS) or electrical modulation, which would also affect different neuronal populations in a non-specific manner.

      While optogenetic stimulation therapy is longer term, using CamKII combined with the DJ hybrid AAV could be a translatable strategy for targeting A13 neuronal populations in non-human primates (Watakabe et al. 2015; Watanabe et al. 2020). We have added to the discussion.

      Reviewer #3 (Public Review):

      Kim, Lognon et al. present an important finding on pro-locomotor effects of optogenetic activation of the A13 region, which they identify as a dopamine-containing area of the medial zona incerta that undergoes profound remodeling in terms of afferent and efferent connectivity after administration of 6-OHDA to the MFB. The authors claim to address a model of PD-related gait dysfunction, a contentious problem that can be difficult to treat with dopaminergic medication or DBS in conventional targets. They make use of an impressive array of technologies to gain insight into the role of A13 remodeling in the 6-OHDA model of PD. The evidence provided is solid and the paper is well written, but there are several general issues that reduce the value of the paper in its current form, and a number of specific, more minor ones. Also, some suggestions, that may improve the paper compared to its recent form, come to mind.

      Thank you for the suggestions and careful consideration of our work - it is appreciated.

      The most fundamental issue that needs to be addressed is the relation of the structural to the behavioral findings. It would be very interesting to see whether the structural heterogeneity in afferent/effects projections induced by 6-OHDA is related to the degree of symptom severity and motor improvement during A13 stimulation.

      As mentioned in comments for Reviewer 1, we have performed additional analysis and present this in Figure 5. We have also revised Figure 4, focusing on motor regions. Our work will provide a roadmap for future studies to disentangle divergent or convergent A13 pathways that are involved in different or all PD-related motor symptoms. Because we could not measure behavioural change in the same animals studied with the anatomic study (essentially because the optrode would have significantly disrupted the connectome we are measuring), we cannot directly compare behaviour to structure.

      The authors provide extensive interrogation of large-scale changes in the organization of the A13 region afferent and efferent distributions. It remains unclear how many animals were included to produce Fig 4 and 5. Fig S5 suggests that only 3 animals were used, is that correct? Please provide details about the heterogeneity between animals. Please provide a table detailing how many animals were used for which experiment. Were the same animals used for several experiments?

      The behavioral set and the anatomical set were necessarily distinct. In the anatomical experiments, we employed both anterograde and retrograde viral approaches to target the afferent and efferent A13 populations with fluorescent proteins. For the behavioral approach, a single ChR2 opsin was utilized to photostimulate the A13 region; hence combining the two populations was not feasible. We were also concerned that the optrode itself would interfere with connectomics. A lower number of animals were used for the whole-brain work due to technical limitations described earlier. We have now provided additional information regarding numbers in all figures and the text. Using Spearman’s correlation analysis, we found afferent and efferent proportions across animals to be consistent, with an average correlation of 0.91, which is reported in Figure S6.

      While the authors provide evidence that photoactivation of the A13 is sufficient in driving locomotion in the OFT, this pro-locomotor effect seems to be independent of 6-OHDA-induced pathophysiology. Only in the pole test do they find that there seems to be a difference between Sham vs 6-OHDA concerning the effects of photoactivation of the A13. Because of these behavioral findings, optogenic activation of A13 may represent a gain of function rather than disease-specific rescue. This needs to be highlighted more explicitly in the title, abstract, and conclusion.

      Optogenetic activation of A13 may represent a gain of function in both healthy and 6-OHDA mice, highlighting a parallel descending motor pathway that remains intact. 6-OHDA lesions have multiple effects on motor and cognitive function. This makes a single pathway unlikely to rescue all deficits observed in 6-OHDA models. The lack of locomotion observed in 6-OHDA models can be reversed by A13 region photostimulation. Therefore, this is a reversal of a loss of function, in this case. However, the increase in turning represents a gain of function. We have highlighted this as suggested in the discussion.

      The authors claim that A13 may be a possible target for DBS to treat gait dysfunction. However, the experimental evidence provided (in particular the lack of disease-specific changes in the OFT) seems insufficient to draw such conclusions. It needs to be highlighted that optogenetic activation does not necessarily have the same effects as DBS (see the recent review from Neumann et al. in Brain: https://pubmed.ncbi.nlm.nih.gov/37450573/). This is important because ZI-DBS so far had very mixed clinical effects. The authors should provide plausible reasons for these discrepancies. Is cell-specificity, which only optogenetic interventions can achieve, necessary? Can new forms of cyclic burst DBS achieve similar specificity (Spix et al, Science 2021)? Please comment.

      Thank you for the valuable comments. They have been incorporated into the discussion.

      Our study highlights a parallel motor pathway provided by the A13 region that remains intact in 6-OHDA mice and can be sufficiently driven to rescue the hypolocomotor pathology observed in the OFT and overcome bradykinesia and akinesia. The photoactivation of ipsilesional A13 also has an overall additive effect on ipsiversive circling, representing a gain of function on the intact side that contributes to the magnitude of overall motor asymmetry against the lesioned side. The effects of DBS are rather complex, ranging from micro-, meso-, to macro-scales, involving activation, inhibition, and informational lesioning, and network interactions. This could contribute to the mixed clinical effects observed with ZI-DBS, in addition to differences in targeting and DBS programming among the studies (see review (Ossowska 2019) ). Also the DBS studies targeting ZI have never targeted the rostromedial ZI which extends towards the hypothalamus and contains the A13. Furthermore, DBS and electrical stimulation of neural tissue, in general, are always limited by current spread and lower thresholds of activation of axons (e.g., axons of passage), both of which can reduce the specificity of the true therapeutic target. Optogenetic studies have provided mechanistic insights that could be leveraged in overcoming some of the limitations in targeting with conventional DBS approaches. Spix et al. (2021) provided an interesting approach highlighting these advancements. They devised burst stimulation to facilitate population-specific neuromodulation within the external globus pallidus. Moreover, they found a complementary role for optogenetics in exploring the pathway-specific activation of neurons activated by DBS. To ascertain whether A13 DBS may be a viable therapy for PD gait, it will be necessary to perform many more preclinical experiments, and tuning of DBS parameters could be facilitated by optogenetic stimulation in these murine models. We have added to the discussion.

      In a recent study, Jeon et al (Topographic connectivity and cellular profiling reveal detailed input pathways and functionally distinct cell types in the subthalamic nucleus, 2022, Cell Reports) provided evidence on the topographically graded organization of STN afferents and McElvain et al. (Specific populations of basal ganglia output neurons target distinct brain stem areas while collateralizing throughout the diencephalon, 2021, Neuron) have shown similar topographical resolution for SNr efferents. Can a similar topographical organization of efferents and afferents be derived for the A13/ ZI in total?

      The ZI can be subdivided into four subregions in the antero-posterior axis: rostral (ZIr), dorsal (ZId), ventral (ZIv), and caudal (ZIc) regions. The dorsal and ventral ZI is also referred together as central/medial/intermediate ZI. There are topographical gradients in different cell types and connectivity across these subregions (see reviews: (Mitrofanis 2005; Monosov et al. 2022; Ossowska 2019). Recent work by Yang and colleagues (2022) demonstrated a topographical organization among the inputs and outputs of GABAergic (VGAT) populations across four ZI subregions. Given that A13 region encompasses a smaller portion (the medial aspect) of both rostral and medial/central ZI (three of four ZI subregions) and coexpress VGAT, A13 region likely falls under rostral and intermediate medial ZI dataset found in Yang et al. (2022). With our data, we would not be able to capture the breadth of topographical organization shown in Yang et al (2022).

      In conclusion, this is an interesting study that can be improved by taking into consideration the points mentioned above.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Figure 2 indeed presents valuable information regarding the effects of A13 region photoactivation. To enhance the comprehensiveness of this figure and gain a deeper understanding of the neurons driving the pro-locomotor effect of stimulation, it would be beneficial to include quantifications of various cell types:

      • cFos-Positive Cells/TH-Positive Cells: it can help determine the impact of A13 stimulation on dopaminergic neurons and the associated pro-locomotor effect in the healthy condition and especially in the context of Parkinson's disease (PD) modeling.

      • cFos-Positive Cells /TH-Negative Cells: Investigating the number of TH-negative cells activated by stimulation is also important, as it may reveal non-dopaminergic neurons that play a role in locomotor responses. Identifying the location and characteristics of these TH-negative cells can provide insights into their functional significance.

      We have completed this analysis. The data is presented in Figure 2F, where we show increased c-fos intensity with photoactivation. We observed an increase in the number of cells activated in the A13 region. However, we did not definitively see increases in TH+ cells, suggesting a heterogeneous set of neurons responsible for the effects—possibly glutamatergic neurons.

      Incorporating these quantifications into Figure 2 would enhance the figure's informativeness and provide a more comprehensive view of the neuronal populations involved in the locomotor effects of A13 stimulation.

      We have added text and a new graph.

      (2) Refer to Figure 3. In the main text (page 5) when describing the animal with 6-OHDA the wrong panels are indicated. It is indicated in Figure 2A-E but it should be replaced with 3A-E.

      Please do that.

      Done, and we have updated the figure to improve readability, by separating the 6-OHDA findings from sham in all graphs.

      Reviewer #2 (Recommendations For The Authors):

      Abstract

      Page 1: Inhibitory or lesion studies will be necessary to support the claim that the global remodeling of afferent and efferent projections of the A13 region highlights the Zona Incerta's role as a crucial hub for the rapid selection of motor function.

      Overall, there is quite a bit of evidence that the zona incerta is a hub for afferent/efferents.

      Mitrofanis (2005) and, more recently, Wang et al. (2020) summarize some of the evidence. Yang (2022) illustrates that the zona incerta shows multiple inputs to GABAergic neurons and outputs to diverse regions. Recent work suggests that the zona incerta contributes to various motor functions such as hunting, exploratory locomotion, and integrating multiple modalities (Zhao et al. 2019; Wang et al. 2019; Monosov et al. 2022; Chometton et al. 2017). The introduction has been updated.

      Introduction

      Page 2, paragraph 2: "However, little attention has been placed on the medial zona incerta (mZI), particularly the A13, the only dopamine-containing region of the rostral ZI" Is the A13 region located in the rostral or medial ZI or both?

      It should have been written “rostromedial” ZI. The A13 is located in the medial aspect of rostromedial ZI. Introduction has been updated.

      Page 2, para 3: Li et al (2021) used a mini-endoscope to record the GCaMP6 signal. Masini and Kiehn, 2022 transiently blocked the dopaminergic transmission; they never used 6-OHDA.

      Please correct through the text.

      Corrected.

      Page 2, para 4: the A13 connectome encompasses the cerebral cortex,... MLR. The MLR is a functional region, correct this for the CNF and PPN.

      Corrected.

      Page 3, the last paragraph of the introduction could be clarified by presenting the behavioral data first, followed by the anatomy.

      This has been corrected

      Figure 1 is nice and clear, and well summarizes the experimental design.

      Thank you.

      Figure 2 shows an example of the extent of the ChR2-YFP expression and the position of an optical fiber tip above the dopaminergic A13 region from a mouse. Without any quantification, these images could be included in Figure 1. Despite a very small volume (36.8nL) of AAV, the extent of ChR2-YFP expression is quite large and includes dopaminergic and unidentified neurons within the A13 region but also a large population of unidentified neurons outside of it, thus raising questions about the volume and the types of neurons recruited.

      This is an important consideration. The issue of viral spread is complex and depends on factors including tissue type, serotype, and promotor of the virus. Li et al. (2021), for example, used different virus serotypes and promotors, injecting 150nL, whereas we used AAV DJ, injecting 36.8nL. AAV-DJ is a hybrid viral type consisting of multiple serotypes. It has a high transduction efficiency, which leads to greater gene delivery than single-serotype AAV viral constructs (Mao et al. 2016). A secondary consideration regarding translation was that AAV-DJ could effectively transduce non-primate neurons (Watanabe et al. 2020). We have addressed the issue of neurons recruited earlier, provided c-Fos quantification, and provided a new supplementary figure showing viral spread (Figure S1).

      Anatomical reconstruction of the extent of the ChR2-YFP expression and the location of the tip of the optical fiber will be necessary to confirm that ChR2-YFP expression was restricted to the A13 region.

      We will provide additional information regarding viral spread, ferrule tip placement, and c-fos cell counts. This has been done in Figure 2 and we also present a new Figure S1 where we have quantified the viral spread.

      Page 5, 1st para: Double-check the references, as not all of them are 6-OHDA injections in the MLF.

      Corrected. Removed Kiehn reference.

      Page 5, 1st para, 4th line: Replace ferrule with optical canula or fiber.

      Done

      Page 5, 1st para, 9th line: Replace Figure 2 with Figure 3.

      Done

      Page 5, 2nd para: About the refractory decrease in traveled distance by sham-ChR2 mice: is this significant?

      It was not significant (Figure S1C, 1-way RM ANOVA: F5,25 = 0.486, P \= 0.783). This has been updated in the text.

      Figure 3 showing behavioral assessments is nice, but the stats are not always clear. In Fig 3A, are each of the off and on boxes 1 minute long? The figure legend states the test lasts 1 min, but isn't it 4 minutes? In Figure 3B-E and 3J-M, what are the differences? Do the stats identify a significant difference only during the stimulation phase? Fig. 3F-I are nice and could have been presented as primary examples prior to data analysis in Fig. 3B-E. Group labels above the graph would help.

      Yes, the off-on boxes are 1 minute long. The error is corrected in the legend. Great suggestion for F-I - they have been moved ahead of the summary figures. We have also updated new Fig 3F-,I, J, L, M) to make the differences between 6-OHDA and sham graphs easier to visualize. The stats do indicate a significant difference during the stimulation phase. We have added group labels, and reorganized the figure, and it is much easier to read now.

      Fig. 3L-M, what do PreSur, Post, and Ferrule mean? I assume that Ferrule refers to mice tested with the optical fiber without stimulation, whereas Stim. refers to the stimulation. It would be helpful to standardize the format of stats in Fig. 3B-E and 3-J-M. What are time points a, b, and c referring to?

      We have renamed the figure names to be more intuitive. We have standardized the presentation of statistics in the figure, and eliminated the a,b,c nomenclature. We have also updated the caption to provide descriptions of the tests in Fig 3 L-M.

      Figure S2A: the higher variability in 6-OHDA-YFP mice in comparison to 6-OHDA-ChR2 mice prior to stimulation suggests that 6-OHDA-YFP mice were less impaired. Why use boxplots only for these data? Would a pairwise comparison be more appropriate?

      We have removed these plots from Figure S2. We now present the Baseline to Pre values across the experimental timespan to illustrate the fact that distance travelled returned to baseline values for all trials conducted.

      Fig. S2B: add the statistical marker.

      We have removed this from Figure S2.

      Page 7, para 1, line 8: to add "in comparison to 6-OHDA-YFP and YFP mice" to during photostimulation... (Figure 3E).

      Done

      Page 7, para 3, line 5: about larger improvement, replace "sham ChR2" with "6-OHDA."

      Done

      Page 8, para 1, line 4: Perier et al., 2000 reported that 6-OHDA injection increased the firing frequency of the ZI over a month.

      Added the timeframe to this sentence.

      Page 8, para 2, line 1: Since the results were expected, add some references.

      Done.

      Page 8, para 3, line 4. Double-check the reference.

      Corrected.

      Page 8: About large-scale changes in the A13 region, the relevance of correlation matrices is difficult to grasp. Analysis of local connectivity would have been more informative in the context of GABAergic and glutamatergic neurons of the ZI in the vicinity of the A13 region.

      We have updated the figures for connectivity throughout the manuscript. Overall, there are new Figures 4 and 5 in the main text. We also provide a revised Supplementary Figure 8. Unfortunately, we could not do that experiment regarding local connectivity. In light of our new work (Sharma et al. 2024), it is clear that this will be critical going forward.

      Page 8, para 3, line: given Fig. 2, there is concern about the claim that only the A13 region was targeted. The time of the analysis after 6-OHDA should be mentioned. Some sections of the paragraph could be moved to methods.

      We have provided more information about the viral spread in the text and Supplementary Figure 1. The functional and anatomical experiments are separate, which we realize caused confusion. We have mentioned analysis time after 6-OHDA and inserted this into the text.

      Fig. 4: The color code helps the reader visualize distribution differences. However, statistical analyses comparing 6-OHDA versus sham should be included. Quantification per region would greatly help readers visualize the data and support the conclusion. The relationship between the type of correlation (positive or negative) and absolute change (increase or decrease) is unknown in the current format, which limits the interpretation of the data. Moreover, examples of raw images of axons and cells should be presented for several brain regions. The experimental design with a timeline, as in Fig. 1, would be helpful. The legend for Fig. 4 is a bit long. Some sections are very descriptive, whereas others are more interpretive.

      We have provided a new Figure 5 where we present quantification per region, and the correlation matrices have been updated in Figure 4. We have also focused on motor regions as mentioned earlier. We also provide examples of raw regions in Supplementary Figure 8. Raw values are shared on our data repository.

      Page 10, para 1, line 1: add "afferent" to "changes in -afferent and- projection patterns."

      Done

      Page 10, para 1, line 9: remove the 2nd "compared to sham" in the sentence.

      Done

      Page 10, para 1, line 10: remove "coordinated" in "several regions showed a coordinated reduction in afferent density." We cannot say anything about the timing of events, as there is only info at 1 month.

      Done

      Page 10, para 2: the section should be written in the past tense.

      Done

      Page 13, para 2, the last sentence is overstated. Please remove "cells" and refer to the A13 region instead.

      Done

      About differential remodelling of the A13 region connectome: Figure 5C and 5G: The proportion of total afferents ipsi- and contralateral to 6-OHDA injection argues that the A13 region primarily receives inputs from the cortical plate and the striatum. Unfortunately, there are no statistics.

      Due to the small sample size, we provided descriptive statistics (mean and error bars) in Figure 5A. As mentioned in comments for Reviewers 1 and 2, we have revised Figure 5 to present data focusing on motor-related pathways to provide clarity. In addition, absolute values are shared on our data repository.

      Figure 5 D and 5H: Changes in the proportion of total afferents/projections are relatively modest (less than 10% of the whole population for the highest changes). There is no standard deviation for these data and no statistics. Do they reflect real changes or variability from the injection site?

      The changes are relatively modest (less than 10%) since a small brain region usually provides a small proportion of total input (McElvain et al. 2021; Yang et al. 2022). The changes in the proportions reflect real differences between average proportions observed in sham and 6-OHDA mice. The variability in the total labelling of neurons and fibers was minimized by normalizing individual regional counts against total counts found in each animal. This figure has been updated as reviewers requested.

      Fig 5F and H: The example in F shows a huge decrease in the striatum, but H indicates only a 2% change, which makes the example not very representative. Absolute values would be helpful.

      While a 2% change may seem small, it represents a relatively large change in the A13 efferent connectome. To provide further clarity, we have provided absolute values as suggested in our new supplemental table.

      Figure 6 is inaccurate and unnecessary.

      Figure 6 has been removed.

      Discussion

      Although interesting, the discussion is too long.

      The discussion has been reduced by about three quarters of a page.

      Methods

      Page 17, para 1: include the stereotaxic coordinates of the optical cannula above the A13 region.

      Added.

      References

      Chen, Fenghua, Junliang Qian, Zhongkai Cao, Ang Li, Juntao Cui, Limin Shi, and Junxia Xie. 2023. “Chemogenetic and Optogenetic Stimulation of Zona Incerta GABAergic Neurons Ameliorates Motor Impairment in Parkinson’s Disease.” i Science 26 (7). https://doi.org/ 10.1016/j.isci.2023.107149.

      Chometton, S., K. Charrière, L. Bayer, C. Houdayer, G. Franchi, F. Poncet, D. Fellmann, and P. Y. Risold. 2017. “The Rostromedial Zona Incerta Is Involved in Attentional Processes While Adjacent LHA Responds to Arousal: C-Fos and Anatomical Evidence.” Brain Structure & Function 222 (6): 2507–25.

      Garau, Celia, Jessica Hayes, Giulia Chiacchierini, James E. McCutcheon, and John Apergis-Schoute. 2023. “Involvement of A13 Dopaminergic Neurons in Prehensile Movements but Not Reward in the Rat.” Current Biology: CB, October.

      https://doi.org/ 10.1016/j.cub.2023.09.044.

      Li, Zhuoliang, Giorgio Rizzi, and Kelly R. Tan. 2021. “Zona Incerta Subpopulations Differentially Encode and Modulate Anxiety.” Science Advances 7 (37): eabf6709.

      Mao, Yingying, Xuejun Wang, Renhe Yan, Wei Hu, Andrew Li, Shengqi Wang, and Hongwei Li. 2016. “Single Point Mutation in Adeno-Associated Viral Vectors -DJ Capsid Leads to Improvement for Gene Delivery in Vivo.” BMC Biotechnology 16 (January):1.

      McElvain, Lauren E., Yuncong Chen, Jeffrey D. Moore, G. Stefano Brigidi, Brenda L. Bloodgood, Byung Kook Lim, Rui M. Costa, and David Kleinfeld. 2021. “Specific Populations of Basal Ganglia Output Neurons Target Distinct Brain Stem Areas While Collateralizing throughout the Diencephalon.” Neuron 109 (10): 1721–38.e4.

      Mitrofanis, J. 2005. “Some Certainty for the ‘Zone of Uncertainty’? Exploring the Function of the Zona Incerta.” Neuroscience 130 (1): 1–15.

      Monosov, Ilya E., Takaya Ogasawara, Suzanne N. Haber, J. Alexander Heimel, and Mehran Ahmadlou. 2022. “The Zona Incerta in Control of Novelty Seeking and Investigation across Species.” Current Opinion in Neurobiology 77 (December):102650.

      Negishi, Kenichiro, Mikayla A. Payant, Kayla S. Schumacker, Gabor Wittmann, Rebecca M.  Butler, Ronald M. Lechan, Harry W. M. Steinbusch, Arshad M. Khan, and Melissa J. Chee. 2020. “Distributions of Hypothalamic Neuron Populations Coexpressing Tyrosine Hydroxylase and the Vesicular GABA Transporter in the Mouse.” The Journal of Comparative Neurology 528 (11): 1833–55.

      Ossowska, Krystyna. 2019. “Zona Incerta as a Therapeutic Target in Parkinson’s Disease.” Journal of Neurology. https://doi.org/ 10.1007/s00415-019-09486-8.

      Romanov, Roman A., Amit Zeisel, Joanne Bakker, Fatima Girach, Arash Hellysaz, Raju Tomer, Alán Alpár, et al. 2017. “Molecular Interrogation of Hypothalamic Organization Reveals Distinct Dopamine Neuronal Subtypes.” Nature Neuroscience 20 (2): 176–88.

      Sharma, Sandeep, Cecilia A. Badenhorst, Donovan M. Ashby, Stephanie A. Di Vito, Michelle A. Tran, Zahra Ghavasieh, Gurleen K. Grewal, Cole R. Belway, Alexander McGirr, and Patrick J. Whelan. 2024. “Inhibitory Medial Zona Incerta Pathway Drives Exploratory Behavior by Inhibiting Glutamatergic Cuneiform Neurons.” Nature Communications 15 (1): 1160.

      Spix, Teresa A., Shruti Nanivadekar, Noelle Toong, Irene M. Kaplow, Brian R. Isett, Yazel  Goksen, Andreas R. Pfenning, and Aryn H. Gittis. 2021. “Population-Specific Neuromodulation Prolongs Therapeutic Benefits of Deep Brain Stimulation.” Science 374 (6564): 201–6.

      Wang, Xiyue, Xiaolin Chou, Bo Peng, Li Shen, Junxiang J. Huang, Li I. Zhang, and Huizhong W. Tao. 2019. “A Cross-Modality Enhancement of Defensive Flight via Parvalbumin Neurons in Zona Incerta.” eLife 8 (April). https://doi.org/ 10.7554/eLife.42728.

      Wang, Xiyue, Xiao-Lin Chou, Li I. Zhang, and Huizhong Whit Tao. 2020. “Zona Incerta: An Integrative Node for Global Behavioral Modulation.” Trends in Neurosciences 43 (2): 82–87.

      Watakabe, Akiya, Masanari Ohtsuka, Masaharu Kinoshita, Masafumi Takaji, Kaoru Isa, Hiroaki Mizukami, Keiya Ozawa, Tadashi Isa, and Tetsuo Yamamori. 2015. “Comparative Analyses of Adeno-Associated Viral Vector Serotypes 1, 2, 5, 8 and 9 in Marmoset, Mouse and Macaque Cerebral Cortex.” Neuroscience Research 93 (April):144–57.

      Watanabe, Hidenori, Hiromi Sano, Satomi Chiken, Kenta Kobayashi, Yuko Fukata, Masaki  Fukata, Hajime Mushiake, and Atsushi Nambu. 2020. “Forelimb Movements Evoked by Optogenetic Stimulation of the Macaque Motor Cortex.” Nature Communications 11 (1): 3253.

      Yang, Yang, Tao Jiang, Xueyan Jia, Jing Yuan, Xiangning Li, and Hui Gong. 2022. “Whole-Brain Connectome of GABAergic Neurons in the Mouse Zona Incerta.” Neuroscience Bulletin 38 (11): 1315–29.

      Ye, Qiying, Jeremiah Nunez, and Xiaobing Zhang. 2023. “Zona Incerta Dopamine Neurons Encode Motivational Vigor in Food Seeking.” bioRxiv: The Preprint Server for Biology, June. https://doi.org/ 10.1101/2023.06.29.547060.

      Zhao, Zheng-Dong, Zongming Chen, Xinkuan Xiang, Mengna Hu, Hengchang Xie, Xiaoning Jia, Fang Cai, et al. 2019. “Zona Incerta GABAergic Neurons Integrate Prey-Related Sensory Signals and Induce an Appetitive Drive to Promote Hunting.” Nature Neuroscience 22 (6): 921–32.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #2 (Public review):

      Summary

      In this extensive comparative study, Moreno-Borrallo and colleagues examine the relationships between plasma glucose levels, albumin glycation levels, diet and lifehistory traits across birds. Their results confirmed the expected positive relationship between plasma blood glucose level and albumin glycation rate but also provided findings that are somewhat surprising or contrast with findings of some previous studies (positive relationships between blood glucose and lifespan, or absent relationships between blood glucose and clutch mass or diet). This is the first extensive comparative analysis of glycation rates and their relationships to plasma glucose levels and life history traits in birds that is based on data collected in a single study, with blood glucose and glycation measured using unified analytical methods (except for blood glucose data for 13 species collected from a database).

      Strengths

      This is an emerging topic gaining momentum in evolutionary physiology, which makes this study a timely, novel and important contribution. The study is based on a novel data set collected by the authors from 88 bird species (67 in captivity, 21 in the wild) of 22 orders, except for 13 species, for which data were collected from a database of veterinary and animal care records of zoo animals (ZIMS). This novel data set itself greatly contributes to the pool of available data on avian glycemia, as previous comparative studies either extracted data from various studies or a ZIMS database (therefore potentially containing much more noise due to different methodologies or other unstandardised factors), or only collected data from a single order, namely Passeriformes. The data further represents the first comparative avian data set on albumin glycation obtained using a unified methodology. The authors used LC-MS to determine glycation levels, which does not have problems with specificity and sensitivity that may occur with assays used in previous studies. The data analysis is thorough, and the conclusions are substantiated. Overall, this is an important study representing a substantial contribution to the emerging field evolutionary physiology focused on ecology and evolution of blood/plasma glucose levels and resistance to glycation.

      Weaknesses

      Unfortunately, the authors did not record handling time (i.e., time elapsed between capture and blood sampling), which may be an important source of noise because handling-stress-induced increase in blood glucose has previously been reported. Moreover, the authors themselves demonstrate that handling stress increases variance in blood glucose levels. Both effects (elevated mean and variance) are evident in Figure ESM1.2. However, this likely makes their significant findings regarding glucose levels and their associations with lifespan or glycation rate more conservative, as highlighted by the authors.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      I understand that your main objective regarding glycation rate and lifespan, was to analyse the species resistance to glycation with respect to lifespan, while factoring out the species-specific variation in blood glucose level. However, I still believe that the absolute glycation level (i.e., not controlled for blood glucose level) may also be important for the evolution of lifespan. Given that blood glucose is positively related to both glycation and lifespan (although with a plateau in the latter case), lifespan could possibly be positively correlated with absolute glycation levels. If significant, that would be an interesting and counterintuitive finding, which would call for an explanation, thereby potentially stimulating further research. If not significant, it would show that long-lived species do not have higher glycation levels, despite having higher blood glucose levels, thereby strengthening your argument about higher resistance of longlived species to glycation. So, in my opinion, the inclusion of an additional model of glycation level on life-history traits, without controlling for blood glucose, is worth considering.

      We include now this model as supplementary material, indicating it in several parts of the text, including some of these issues we discussed here.

      Lines 230-231: Please, provide a citation for these GVIF thresholds

      We include it now.

      Figure 3: I think that showing both glucose and glycation rate on the linear scale, rather than log scale, would better illustrate your conclusion - the slowing rise of glycation rate with increasing glucose levels.

      That is a good point, although it may also be confusing for readers to see a graph that represents the data in a different way as the models. Maybe showing both graphs (as 3.A and 3.B) can solve it?

      Figure 4. I recommend stating in the caption that the whiskers do not represent interquartile ranges (a standard option in box plots) but credible intervals as mentioned in the current version of the public author response.

      Sorry about that, it was missed. Now it is included. Nevertheless, interquartile ranges from the posterior distributions can still be observed here represented with the boxes. Then the whiskers are the credible intervals.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Guo and colleagues used a cell rounding assay to screen a library of compounds for inhibition of TcdB, an important toxin produced by Clostridioides difficile. Caffeic acid and derivatives were identified as promising leads, and caffeic acid phenethyl ester (CAPE) was further investigated.

      Strengths:

      Considering the high morbidity rate associated with C. difficile infections (CDI), this manuscript presents valuable research in the investigation of novel therapeutics to combat this pressing issue. Given the rising antibiotic resistance in CDI, the significance of this work is particularly noteworthy. The authors employed a robust set of methods and confirmatory tests, which strengthened the validity of the findings. The explanations provided are clear, and the scientific rationale behind the results is well-articulated. The manuscript is extremely well-written and organized. There is a clear flow in the description of the experiments performed. Also, the authors have investigated the effects of CAPE on TcdB in careful detail and reported compelling evidence that this is a meaningful and potentially useful metabolite for further studies.

      Weaknesses:

      This is really a manuscript about CAPE, not caffeic acid, and the title should reflect that. Also, a few details are missing from the description of the experiments. The authors should carefully revise the manuscript to ascertain that all details that could affect the interpretation of their results are presented clearly. Just as an example, the authors state in the results section that TcdB was incubated with compounds and then added to cells. Was there a wash step in between? Could compound carryover affect how the cells reacted independently from TcdB? This is just an example of how the authors should be careful with descriptions of their experimental procedures. Lastly, authors should be careful when drawing conclusions from the analysis of microbiota composition data. Ascribing causality to correlational relationships is a recurring issue in the microbiome field. Therefore, I suggest authors carefully revise the manuscript and tone down some statements about the impact of CAPE treatment on the gut microbiota.

      Thanks for your constructive suggestion. We have carefully revised the manuscript, including the description of title, results and methods sections.

      Reviewer #2 (Public review):

      Summary:

      This work is towards the development of nonantibiotic treatment for C. difficile. The authors screened a chemical library for activity against the C. difficile toxin TcdB, and found a group of compounds with antitoxin activity. Caffeic acid derivatives were highly represented within this group of antitoxin compounds, and the remaining portion of this work involves defining the mechanism of action of caffeic acid phenethyl ester (CAPE) and testing CAPE in mouse C. difficile infection model. The authors conclude CAPE attenuates C. difficile disease by limiting toxin activity and increasing microbial diversity during C. difficile infection.

      Strengths/ Weaknesses:

      The strategy employed by the authors is sound although not necessarily novel. A compound that can target multiple steps in the pathogenies of C. difficile would be an exciting finding. However, the data presented does not convincingly demonstrate that CAPE attenuates C. difficile disease and the mechanism of action of CAPE is not convincingly defined. The following points highlight the rationale for my evaluation.

      (1) The toxin exposure in tissue culture seems brief (Figure 1). Do longer incubation times between the toxin and cells still show CAPE prevents toxin activity?

      Thanks for your comments. The cytotoxicity assay was employed to directly assess the protective capacity of CAPE against cell death induced by TcdB. Our observations at 1 and 12 h post-TcdB exposure revealed that CAPE effectively mitigated the toxic effects of the TcdB at both time points, demonstrating its potent protective role. Please see Figure S1.

      (2) The conclusion that CAPE has antitoxin activity during infection would be strengthened if the mouse was pretreated with CAPE before toxin injections (Figure 1D).

      Thanks for your constructive comments. According to your suggestion, we administered TcdB 2 h after pretreatment with CAPE. The outcomes demonstrated that CAPE pretreatment significantly enhanced the survival rate of the intoxicated mice, confirming that CAPE retains its antitoxin efficacy during the infection process. Please see Figure S2.

      (3) CAPE does not bind to TcdB with high affinity as shown by SPR (Figure 4). A higher affinity may be necessary to inhibit TcdB during infection. The GTD binds with millimolar affinity and does not show saturable binding. Is the GTD the binding site for CAPE? Auto processing is also affected by CAPE indicating CAPE is binding non-GTD sites on TcdB.

      Thanks for your comments. Our findings indicate that the GTD domain is a critical binding site for CAPE. CAPE exerts its protective effects at multiple stages of TcdB-mediated cell death, including inhibiting TcdB's self-cleavage and blocking the activity of GTD, thereby preventing the glycosylation modification of Rac1 by TcdB.

      (4) In the infection model, CAPE does not statistically significantly attenuate weight loss during C. difficile infection (Figure 6). I recognize that weight loss is an indirect measure of C. difficile disease but histopathology also does not show substantial disease alleviation (see below).

      Thanks for your comments. Our comparative analysis revealed a notable distinction in the body weight of mice on the third day post-infection (Figure 6B). Similarly, the dry/wet stool ratio exhibited a comparable pattern, suggesting that treatment with phenethyl caffeic acid ameliorated Clostridium difficile-induced diarrhea to a significant degree (Figure 6C).

      (5) In the infection model (Figure 6), the histopathology analysis shows substantial improvement in edema but limited improvement in cellular infiltration and epithelial damage. Histopathology is probably the most critical parameter in this model and a compound with disease-modifying effects should provide substantial improvements.

      Thanks for your comments. Edema, inflammatory factor infiltration, and epithelial damage served as key evaluation metrics. Statistical analysis revealed that the pathological scores of mice treated with CAPE were markedly reduced compared to those in the model group (Figure 6F).

      (6) The reduction in C. difficile colonization is interesting. It is unclear if this is due to antitoxin activity and/or due to CAPE modifying the gut microbiota and metabolites (Figure 6). To interpret these data, a control is needed that has CAPE treatment without C. difficile infection or infection with an atoxicogenic strain.

      The observed reduction in C. difficile fecal colonization following drug treatment may be attributed to the CAPE's antitoxin properties or its capacity to modify the intestinal microbiota and metabolites. These two mechanisms likely work in tandem to combat CDI. CDI is primarily triggered by the toxins A (TcdA) and B (TcdB) secreted by the bacterium. Certain therapies, including monoclonal antibodies like bezlotoxumab, target CDI by neutralizing these toxins, thereby mitigating gut damage and subsequent C. difficile colonization(1,2). The establishment of C. difficile in the gut is intricately linked to the equilibrium of the intestinal microbiota. Although antibiotic treatments can inhibit C. difficile growth, they may also disrupt the microbial balance, potentially facilitating the overgrowth of other pathogens. Consequently, interventions such as fecal microbiota transplantation (FMT) are designed to reestablish gut flora balance and consequently decrease C. difficile colonization(3,4). Moreover, the administration of probiotics and prebiotics is considered to reduce C. difficile colonization by modifying the gut environment(5,6).

      (7) Similar to the CAPE data, the melatonin data does not display potent antitoxin activity and the mouse model experiment shows marginal improvement in the histopathological analysis (Figure 9). Using 100 µg/ml of melatonin (~ 400 micromolar) to inactivate TcdB in cell culture seems high. Can that level be achieved in the gut?

      The uptake and dissemination of melatonin within the body varies with the dose administered. For instance, in rats, the bioavailability of melatonin following administration was found to be 53.5%, whereas in dogs, bioavailability was nearly complete (100%) at a dose of 10 mg/kg, yet it decreased to 16.9% at a lower dose of 1 mg/kg(7). This data suggests that the absorption of melatonin differs across various animal species and is influenced by the dose administered. Moreover, it underscores the higher potential bioavailability of melatonin, implying that a dose of 200 mg/kg should be adequate to achieve the desired concentration in the body post-administration.

      (8) The following parameters should be considered and would aid in the interpretation of this work. Does CAPE directly affect the growth of C. difficile? Does CAPE affect the secretion of TcdB from C. difficile? Does CAPE alter the sporulation and germination of C. diffcile?

      We incorporated CAPE into the MIC assay for detecting C. difficile, as well as for assessing the sporulation capacity of C. difficile and evaluating the secretion level of TcdB. The findings revealed that CAPE markedly repressed tcdB transcription at a concentration of 16 μg/mL and effectively suppressed the growth and sporulation of C. difficile BAA-1870 at a concentration of 32 μg/mL. Please see Figure S3.

      References:

      (1) Skinner AM, et al. Efficacy of bezlotoxumab to prevent recurrent Clostridioides difficile infection (CDI) in patients with multiple prior recurrent CDI. Anaerobe. 2023 Dec; 84: 102788.

      (2) Wilcox MH, et al. Bezlotoxumab for Prevention of Recurrent Clostridium difficile Infection. N Engl J Med. 2017 Jan 26;376(4):305-317.

      (3) Khoruts A, Sadowsky MJ. Understanding the mechanisms of faecal microbiota transplantation. Nat Rev Gastroenterol Hepatol. 2016 Sep;13(9):508-16.

      (4) Khoruts A, Staley C, Sadowsky MJ. Faecal microbiota transplantation for Clostridioides difficile: mechanisms and pharmacology. Nat Rev Gastroenterol Hepatol. 2021 Jan;18(1):67-80.

      (5) Mills JP, Rao K, Young VB. Probiotics for prevention of Clostridium difficile infection. Curr Opin Gastroenterol. 2018 Jan;34(1):3-10.

      (6) Lau CS, Chamberlain RS. Probiotics are effective at preventing Clostridium difficile-associated diarrhea: a systematic review and meta-analysis. Int J Gen Med. 2016 Feb 22; 9:27-37.

      (7) Yeleswaram K, et al. Pharmacokinetics and oral bioavailability of exogenous melatonin in preclinical animal models and clinical implications. J Pineal Res. 1997 Jan;22(1):45-51.

      Reviewer #3 (Public review):

      Summary:

      The study is well written, and the results are solid and well demonstrated. It shows a field that can be explored for the treatment of CDI.

      Strengths:

      The results are really good, and the CAPE shows a good and promising alternative for treating CDI. The methodology and results are well presented, with tables and figures that corroborate them. It is solid work and very promising.

      Weaknesses:

      Some references are too old or missing.

      Thanks for your constructive suggestion. We have included and refreshed several references to enhance the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      While the manuscript convincingly demonstrates that CAPE affects the TcdB toxin and reduces its toxicity in vitro, it would be beneficial to include data on the effect of CAPE on the growth of C. difficile. This would help ensure that the observed in vivo effects are not merely due to reduced bacterial growth but rather due to the specific action of CAPE on the toxin.

      Thanks for your constructive suggestion. We have augmented our findings with the impact of CAPE on the bacteria themselves, revealing that CAPE not only hampers the growth of the bacterial cells but also suppresses their capacity to produce spores. Please see Figure S3.

      (1) Line 41, line 115 - authors should clarify what they mean when mentioning Bacteroides within parentheses.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (2) Line 71 - Is C. difficile really found "in the environment"?

      Thanks for your comments. C. difficile is prevalent across various natural settings, including soil and water ecosystems. A study has identified highly diverse strains of this bacterium within environmental samples(1). Moreover, the significant presence of C. difficile in soil and lawn specimens collected near Australian hospitals indicates that the organism is indeed a common inhabitant in the environment(2).

      (3) Lines 128-130 - Was there a wash step here? What could be the impact of compound carryover in this experiment?

      Thanks for your comments. Following pre-incubation of TcdB with CAPE, remove the compounds that have not bound to TcdB through centrifugation. The persistence of the compound in the culture post-washing could result in an inflated assessment of its efficacy, particularly if it continues to engage with TcdB or the cells beyond the initial 1-hour pre-incubation window. The carryover of the compound might also give rise to misleading positive results, where the compound seems to confer protection or inhibition against TcdB-mediated cell rounding, whereas such effects are actually due to the lingering activity of the compound. This carryover could skew the determination of the compound's minimum effective concentration, as the effective concentration interacting with the cells might be inadvertently elevated. Furthermore, if the compounds possess cytotoxic properties or impact cell viability, carryover could generate artifacts in cell morphology that are unrelated to the direct interaction between TcdB and the compounds.

      (4) Lines 133-134 - I suggest authors mention how many caffeic acid derivatives there were in the entire library so that the suggested "enrichment" of them in the group of bioactive compounds can be better judged.

      Thanks for your comments. The natural compound library contained eight caffeic acid derivatives, of which methyl caffeic acid and ferulic acid displayed no efficacy. This information has been incorporated into the manuscript.

      (5) Line 135 - I recommend the authors add the molarity of the compound solutions used.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (6) Line 247 - I think the term "CAPE mice" is confusing. Please use a full description.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (7) Line 248 - I also think the terms "model mice" and "model group" are confusing. Maybe call them "control mice"?

      Thanks for your comments. The terms "model mice" and "model group" are indeed synonymous, and we have subsequently clarified that control mice refer to those that have not been infected with C. difficile.

      (8) Line 273 - "most abundant species at the genus level" is incorrect. I think what you mean is "most abundant TAXA".

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (9) Line 278 - Please include your p-value cut-off together with the LDA score.

      Thanks for your comments. We have revised the above description to “LDA score > 3.5, p < 0.05”.

      (10) Line 292 - Details on how metabolomics was performed should be included here.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (11) Line 299 - 1.5 is a fairly low cut-off. The authors should at a minimum also include the p-value cut-off used.

      Response: Thanks for your comments. We have revised the above description to “fold change > 1.5, p < 0.05”.

      (12) Line 307 - Purine "degradation" would be better here.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (13) Line 328 onward - The melatonin experiment is a weird one. Although I fully understand the rationale behind testing the effect of melatonin in the mouse model, the idea that just because melatonin levels changed in the gut it would act as a direct inhibitor of TcdB was very far-fetched, even though it ended up working. Authors should explain this in the manuscript.

      Thanks for your comments. Furthermore, beyond our murine studies, we have confirmed that melatonin significantly diminishes TcdB-induced cytotoxicity at the cellular level (Figure 9A). Additionally, it has been documented that melatonin, acting as an antimicrobial adjuvant and anti-inflammatory agent, can decrease the recurrence of CDI(3). Consequently, we contend that the aforementioned statement is substantiated.

      (14) Lines 429-435 - There are seemingly contradictory pieces of information here. The authors state that adenosine is released from cells upon inflammation and that CAPE treatment caused an increase in adenosine levels. Later in this section, the authors state that adenosine prevents TcdA-mediated damage and inflammation. This should be clarified and better discussed.

      Thanks for your comments. Adenosine modulates immune responses and inflammatory cascades by interacting with its receptors, including its capacity to suppress the secretion of specific pro-inflammatory mediators. We have updated this depiction in the manuscript.

      (15) Lines 513-514 - How was this phenotype quantified?

      Thanks for your comments. Initially, we introduced TcdB at a final concentration of 0.2 ng/mL along with various concentrations of compounds into 1 mL of medium for a 1-h pre-incubation period. Subsequently, unbound compounds were removed through centrifugation, and the resulting mixture was then applied to the cells.

      (16) Figure 3 - panels are labeled incorrectly.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (17) Figure 5C - it is unclear what the different colors and labels represent.

      Thanks for your comments. In the depicted graph, blue denotes the total binding energy, red signifies the electrostatic interactions, green corresponds to the van der Waals forces, and orange indicates solvation or hydration effects. The horizontal axis represents the mutation of the amino acid residue at the respective position to alanine. As illustrated in Figure 5C, the mutations W520A and GTD exhibit the highest binding energies.

      References:

      (1) Janezic S, et al. Highly Divergent Clostridium difficile Strains Isolated from the Environment. PLoS One. 2016 Nov 23;11(11): e0167101.

      (2) Perumalsamy S, Putsathit P, Riley TV. High prevalence of Clostridium difficile in soil, mulch and lawn samples from the grounds of Western Australian hospitals. Anaerobe. 2019 Dec; 60:102065.

      (3) Sutton SS, et al. Melatonin as an Antimicrobial Adjuvant and Anti-Inflammatory for the Management of Recurrent Clostridioides difficile Infection. Antibiotics (Basel). 2022 Oct 25;11(11):1472.

      Reviewer #2 (Recommendations for the authors):

      Minor comments and questions.

      (1) Which form of TcdB is being used in these experiments?

      Thanks for your comments. The TcdB proteins used in this study are TcdB1 subtypes.

      (2) Why are THP-1 cells being used in these assays?

      Thanks for your comments. For the purposes of this study, we employed a diverse array of cell lines, including Vero, HeLa, THP-1, Caco-2, and HEK293T. Each cell line was selected to serve a specific experimental objective. The inclusion of the THP-1 cell line was necessitated by the need to incorporate a macrophage cell line to ensure the comprehensive nature of our experiments, allowing for the testing of both epithelial cells and macrophages. C. difficile is a kind of intestinal pathogenic bacteria, and immune clearance plays a vital role in the process of pathogen infection, so THP-1 cells are used as important immune cells.

      (3) Please improve the quality of the microscopy images in Figure 1.

      Thanks for your comments. We have improved the quality of the microscopy images in Figure 1.

      (4) Does the flow cytometry experiment in Figure 2B show internalization? Surface-bound toxins would provide the same histogram.

      Thanks for your comments. Figure 2B was employed to assess the internalization of TcdB, and the findings indicate that CAPE does not influence the internalization process of TcdB.

      (5) The sensogram in Figure 4A does not look typical and should be clarified.

      Thanks for your comments. Typically, small molecules and proteins engage in a rapid binding and dissociation dynamic. However, as depicted in Figure 4A, the interaction between CAPE and TcdB demonstrates a gradual progression towards equilibrium. This behavior can be primarily explained by the swift occupation of the protein's primary binding sites by the small molecule in the initial stages. Subsequently, CAPE binds to secondary or lower affinity sites, extending the time needed to reach equilibrium. Additionally, the likelihood of CAPE binding to multiple sites on TcdB requires time for the exploration and occupation of these diverse locations before equilibrium is attained, we have incorporated an analysis of this potential scenario into the manuscript.

      Reviewer #3 (Recommendations for the authors):

      These are my suggestions for the text:

      (1) Line 29: high recurrent rates.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (2) Line 32: Where is the caffeic acid identified? I think a line should be included.

      Thanks for your comments. Caffeic acid was identified from natural compounds library and we have completed the corresponding modifications according to the suggestions.

      (3) Line 39: C. difficile is not italic.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (4) Line 41: Bacteroides spp.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (5) Line 56: This number of casualties 56.000 is still happening or it was in the past?

      Thanks for your comments. The mortality rates reported in the manuscript reflect a downturn in the incidence and fatality of CDI around 2017(1), as the infection gained broader recognition. Nonetheless, a recent study reveals that the mortality rate for CDI cases in Germany can soar to 45.7% within a year, with the overall economic burden amounting to approximately 1.6 billion euros. This underscores the ongoing significance of CDI as a global public health challenge(2).

      (6) Line 104: Where did the idea of testing caffeic acid come from? Any previous study of the authors? Any studies with the inhibition of other pathogens?

      Thanks for your comments. Initially, we conducted a screen of a compound library comprising 2,076 compounds and identified several potent inhibitors, which, upon structural analysis, were revealed to be caffeic acid derivatives. Prior to our investigation, no studies had explored the potential of CAPE in this context.

      (7) Line 115: Bacteroides spp.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      Results section

      (8) Did the authors try the caffeic acid with the TcdA or binary toxin? I know this is not the purpose of the study, but TcdA toxin has a high identity structure with TcdB and generates inflammation in the gut via neutrophils. Negative strains for the major toxins and positive for the binary toxin also cause severe cases of CDI.

      Thanks for your comments. Although we acknowledge the significance of TcdA and binary toxins in CDI, we did not investigate the impact of CAPE on these toxins. Our focus was exclusively on the effect of CAPE against TcdB, as it is the primary virulence factor in C. difficile pathogenesis. Since TcdA and TcdB are highly similar in structure, we will analyze the neutralization effect of CAPE on TcdA in later studies.

      (9) Does caffeic acid have any effect on C. difficle? Or does it only gain the toxins? That would be ideal.

      Thanks for your comments. We have included additional related assays in our study. Beyond directly neutralizing TcdB, CAPE also demonstrates the capacity to inhibit the growth and spore formation of C. difficile.

      (10) Line 230: C. difficile BAA-1870 is a clinical strain? There are no details about it in the paper.

      Thanks for your comments. C. difficile BAA-1870 (RT027/ST1), a highly virulent isolate frequently employed in research(3-6), was kindly donated by Professor Aiwu Wu. We have meticulously noted the PCR ribotype in our manuscript.

      (11) Line 236: Did the mice fully recover from CDI after the administration of the CAPE? Was one dose enough?

      Thanks for your comments. CAPE was administered orally at 24 h intervals, commencing with the initial dose on Day 0. By the time a significant difference was observed on Day 3, the treatment had been administered a total of three times.

      Methodology

      (12) Most of the methods do not have a reference.

      Thanks for your comments. We have added several references to the methods.

      Discussion section

      (13) The first two paragraphs of the discussion should be summarized. Those details were already explained in the introduction.

      Thanks for your comments. The discussion section and the introduction address slightly different focal points; therefore, we aim to retain the first two paragraphs to maintain continuity and context.

      (14) Line 382: Bezolotoxumab was approved by the FDA in 2016. It is not recent.

      Thanks for your comments. We have revised the above description.

      (15) Line 410: "Despite the high 410 cure rate and increasing popularity of FMT, its safety remains controversial. Although this is true, recently (2022) the FDA approved the Rebyota, which was later cited by the authors.

      Thanks for your comments. We have revised the above description.

      (16) Lines 415-416: "the abundance of Bacteroides, a critical gut microbiota component that is required for C. difficile resistance". There is only one reference cited by the authors. I suppose that if it is true, more studies should be mentioned. Why are probiotics with Bacteroides spp. not available in the market?

      Thanks for your comments. We have supplemented additional references. The scarcity of probiotic products containing Bacteroides spp. on the market is primarily attributable to the stringent requirements of their survival conditions. As most Bacteroides spp. are anaerobic, they thrive in oxygen-deprived environments. This unique survival trait poses challenges in maintaining their viability during product preservation and distribution, which in turn escalates production costs and complexity. Furthermore, despite the significant role of Bacteroides in gut health, research into its potential probiotic benefits and safety is comparatively underexplored.

      References:

      (1) Guh AY, et al. Emerging Infections Program Clostridioides difficile Infection Working Group. Trends in U.S. Burden of Clostridioides difficile Infection and Outcomes. N Engl J Med. 2020 Apr 2;382(14):1320-1330.

      (2) Schley K, et al. Costs and Outcomes of Clostridioides difficile Infections in Germany: A Retrospective Health Claims Data Analysis. Infect Dis Ther. 2024 Nov 20.

      (3) Saito R, et al. Hypervirulent clade 2, ribotype 019/sequence type 67 Clostridioides difficile strain from Japan. Gut Pathog. 2019 Nov 4; 11:54.

      (4) Pellissery AJ, Vinayamohan PG, Venkitanarayanan K. In vitro antivirulence activity of baicalin against Clostridioides difficile. J Med Microbiol. 2020 Apr;69(4):631-639.

      (5) Shao X, et al. Chemical Space Exploration around Thieno[3,2-d]pyrimidin-4(3H)-one Scaffold Led to a Novel Class of Highly Active Clostridium difficile Inhibitors. J Med Chem. 2019 Nov 14;62(21):9772-9791.

      (6) Mooyottu S, Flock G, Venkitanarayanan K. Carvacrol reduces Clostridium difficile sporulation and spore outgrowth in vitro. J Med Microbiol. 2017 Aug;66(8):1229-1234.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      Chabukswar et al analysed endogenous retrovirus (ERV) Env variation in a set of primate genomes using consensus Env sequences from ERVs known to be present in hominoids using a Blast homology search with the aim of characterising env gene changes over time. The retrieved sequences were analysed phylogenetically, and showed that some of the integrations are LTR-env recombinants.

      Strengths

      The strength of the manuscript is that such an analysis has not been performed yet for the subset of ERV Env genes selected and most of the publicly available primate genomes.

      Weaknesses

      Unfortunately, the weaknesses of the manuscript outnumber its strengths. Especially the methods section does not contain sufficient information to appreciate or interpret the results. The results section contains methodological information that should be moved, while the presentation of the data is often substandard. For instance, the long lists of genomes in which a certain Env was found could better be shown in tables. Furthermore, there is no overview of the primate genomes Saili how did you answer to this?, or accession numbers, used. It is unclear whether the analyses, such as the phylogenetic trees, are based on nucleotide or amino acid sequences since this is not stated. tBLASTn was used in the homology searches, so one would suppose aa are retrieved. In the Discussion, both env (nt?) and Env (aa?) are used.

      For the non-hominoids, genome assembly of publicly available sequences is not always optimal, and this may require Blasting a second genome from a species. Which should for instance be done for the HML2 sequences found in the Saimiri boliviensis genome, but not in the related Callithrix jacchus genome. Finally, the authors propose to analyse recombination in Env sequences but only retrieve env-LTR recombinant Envs, which should likely not have passed the quality check.

      Since the Methods section does not contain sufficient information to understand or reproduce the results, while the Results are described in a messy way, it is unclear whether or not the aims have been achieved. I believe not, as characterisation of env gene changes over time is only shown for a few aberrant integrations containing part of the LTR in the env ORF.

      We thank the reviewer for the critiques of the manuscript and their constructive suggestions to improve the clarity, methodological rigor, and data presentation.

      (1) The concern regarding the insufficient data in the methods has been resolved in the revised manuscript by adding a supplementary file that contains the genome assemblies that  were used to perform the tBLAStn analysis using the reconstructed Env sequences. The requested accession numbers are available for all sequences in the supplementary phylogenetic figures.

      (2) We have also modified the manuscript by moving a portion of the results section in the methods section, in particular all the methodological description of the reconstruction of Env part (Line 197-231).

      (3) As suggested, the long list of genomes mentioned in the results section in which the Env tBLASTn hits were obtained are now provided in the table form (Table 2) as an overall summary of the distribution of ERV Env in the genomes and the genome assemblies are mentioned in Supplementary file 2.

      (4) As for the point regarding the tBLASTn usage in the homology searches, we first performed tBLASTn analysis using the reconstructed Env amino acid sequences as query and performed tBLASTn similarity search in the primate genomes. The tBLASTn algorithm uses the amino acid sequences to compare with the translated nucleotide database in all six frames and hence the hits obtained are nucleotide sequences (Line 381-383). These nt sequences were used for all the further analysis such as sequence alignment, phylogenetic analysis and recombination analysis. For better clarity, we have specified the use of env nt alignments in the methods section to avoid the raised confusion in the discussion.

      (5) For the HML supergroup characterization in squirrel monkey genome (Saimiri boliviensis), we used the tBLASTn hits obtained in the S. boliviensis from the initial analysis to perform the comparative genomics in two Platyrrhini genomes available on UCSC Genome browser. In particular, this analysis was performed to confirm the presence of specific members of HML supergroup in squirrel monkey genomes that has not been previously reported. We used the available genome assemblies because of the annotations available on Genome browser, and especially the possibility to use the repeatmasker tracks and the comparative genomics tools in order to use the human genome as a reference. We reported the coordinates for the members of HML supergroup that were retrieved through the comparative genomic assemblies by applying the repeat masker custom track, that have many ERVS that are not present in NCBI reference genomes.

      (6) The concern regarding only retrieving env-LTR recombinant Envs has been addressed in the revised results section (Lines 747-758). As also mentioned in the methods section, the RDP software detects the recombinant sequences and a breakpoint position for the recombinant signals and hence we confirmed only those sequences that were predicted as potential recombinant sequences by the RDP software through comparative genomics. All the sequences predicted by the software were env-LTR recombinant and hence we confirmed and reported only those recombinant sequences in the manuscript.

      Reviewer #1 (Recommendations for the authors):

      The paper could be strengthened by:

      - a rigorous rewriting and shortening of the manuscript, thereby eliminating all textbook-like paragraphs, and all biological misinterpretations and confusions. Distinguish between retroviral replication as an exogenous virus, and host genome remodeling affecting ERVs. Rewrite the sections on template switching by RT being the basis for the observed recombinations, while host genome recombinations are far more likely. ERVs with such aberrant env/LTR gene recombination are unlikely to be fit for cross-species transmission. Likely, such a recombinant was generated in a common ancestor. Also, host RNA polymerase II transcribes retroviral RNA (line 79), not RT.

      - check lines 89-90 as pro is part of the pol gene in gamma- and lentiviruses.

      We thank the reviewer for the suggestion, we have revised the manuscript by shortening the introduction section and eliminating the textbook like paragraphs and also clarifying the recombination mechanism. We have revised the introduction section at Lines 102-111, and the clarification for the recombination mechanism is provided at lines 1668-1675

      - adding much more information to the Methods section. Such as which genomes were searched, were nt or aa have been retrieved and analysed, were multiple genomes of a species searched, a list of databases used ('various databases' in line 164 does not suffice), etc.

      We thank the reviewer for the observation. As mentioned above, in the revised manuscript we have provided more detailed methods by including a supplementary file for the genome assemblies used for tBLASTn analysis and comparative genomics. For the sequence alignment, phylogenetic analysis and recombination analysis we used nt sequences, as it is also mentioned in the revised version. Lastly, all the databases that were used and are mentioned in the methods section.

      - more information is needed on the alignments and phylogenetic trees. For instance, how were indels treated? How long were the alignments on average regarding informative sites?

      We thank the reviewer for the questions, to answer them we have added a paragraph (Lines 359-362) describing the reconstruction process in more details.

      - confirm the findings about the presence or absence of an ERV, such as for the squirrel monkey genome, using additional genomes of the species

      As mentioned above, we only used the genome assemblies available on the genome browser because of the annotations available on Genome browser, blasting the second NCBI RefSeq genome using the BLAST algorithm does not provide accurate information and annotations compared to that of Genome browser and hence we reported the coordinates for the members of HML supergroup that were retrieved through the comparative genomic assemblies by applying the repeat masker custom track, that have many ERVS that are not present in NCBI reference genomes.

      - present the lists of findings in primate genomes on pages 9 and 10 in tables

      We thank the reviewer for the suggestion, we have provided a new table (Table 2) in the revised version summarizing the ERV Env distribution results.

      - a significant limitation of the study is that only env ERVs found in hominoids have been searched in OWM and NWM, not ones specific for monkeys. This should be mentioned somewhere.

      As the reviewer pointed out, the study was designed to explore ERVs’ Env  sequences in hominoids which were then searched in the OWM and NWM genomes, this is now better stated in the introduction at Lines 57-60.

      - define abbreviations at first use (e.g. HML in abstract)

      We thank the reviewer for the suggestion, we have mentioned the abbreviations in the abstract, where we mentioned HML first (Line 65)

      - explain 'pathological domestication' (line 42). Domestication implies usefulness to the host. And over time, deleterious insertions would have been likely purged from a population.

      We thank the reviewer for the observation, we have modified the sentence and provided a clearer explanation for the pathological and physiological consequences of ERVs’ env (lines 52-57).

      Furthermore:

      - why begin the discussion with a lengthy description of domestication and syncytins, which is not part of the current study?

      We thank the reviewer for the critique. Accordingly, we have now modified the discussion section by shortening the part about domestication of syncytins, and just mentioned them as an example at lines 942-944.

      - how can 96 hits have been retrieved for spuma-like envs (line 506), while it was earlier reported (line 333), that the most hits were gamma-like?

      We thank the reviewer for the observation, we have clarified and explained how 96 hits have been retrieved for spuma-like envs in lines 670-677 of the discussion section.

      English grammar should be improved throughout the manuscript.

      And I could not open half of the supplementary files

      As suggested we have revised English and checked that all files were correctly open.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Chabukswar et al. describes a comprehensive attempt to identify and describe the diversity of retroviral envelope (env) gene sequences present in primate genomes in the form of ancient endogenous retrovirus (ERV) sequences.

      Strengths:

      The focus on env can be justified because of the role the Env proteins likely played in determining viral tropism and host range of the viruses that gave rise to the ERV insertions, and to a lesser extent, because of the potential for env ORFs to be coopted for cellular functions (in the rare cases where the ORF is still intact and capable of encoding a functional Env protein). In particular, these analyses can reveal the potential roles of recombination in giving rise to novel combinations of env sequences. The authors began by compiling env sequences from the human genome (from human endogenous retrovirus loci, or "HERVs") to build consensus Env protein sequences, and then they use these as queries to screen other primate genomes for group-specific envs by tBLASTn. The "groups" referred to here are previously described, as unofficial classifications of endogenous retrovirus sequences into three very broad categories - Class I, Class II and Class III. These are not yet formally recognized in retroviral taxonomy, but they each comprise representatives of multiple genera, and so would fall somewhere between the Family and Genus levels. The retrieved sequences are subject to various analyses, most notably they are screened for evidence of recombination. The recombinant forms appear to include cases that were probably viral dead-ends (i.e. inactivating the env gene) even if they were propagated in the germline.

      The availability of the consensus sequences (supplement) is also potentially useful to others working in this area.

      Weaknesses:

      The weaknesses are largely in presentation. Discussions of ERVs are always complicated by the lack of a formal and consistent nomenclature and the confusion between ERVs as loci and ERVs as indirect information about the viruses that produced them. For this reason, additional attention needs to be paid to precise wording in the text and/or the use of illustrative figures.

      We thank the reviewer for the general observation. We put additional attention to the wording in text/figures, and hope to have improved the manuscript clarity.

      Reviewer #2 (Recommendations for the authors):

      Reviewing the manuscript was a challenge because figures were difficult to read. As provided, the fonts were sometimes too small to read in a standard layout and had to be expanded on screen.

      The tree in Figure 3 could also be made easier to read, for example if the authors collapsed related branches and gave the clusters a single, clear label (this is not necessary, just a suggestion) - especially if the supplementary trees have all the labelled branches for any readers who want specific details.

      I also recommend asking a third party (perhaps a scientific colleague) with fluency in English grammar and familiarity with English scientific idiom to provide some editorial feedback on the text.

      Figure 4 legend is confusing. From the description it sounds like the tree in 4B is a host phylogeny, but it's not clearly stated. And if so, how was the tree generated? Is it based on entire genomes? Include at least enough methodological detail or citations that someone could recreate it, if necessary. The details and how it was done should be briefly mentioned here and in detail in the Methods section.

      We thank the reviewer for the observation. As for Figure 4 we have modified its legend and more clearly stated how the phylogenetic tree of the primate genomes was generated using TimeTree. We have also provided further details in the methods section (Lines 475-489).

      As suggested we have revised English.

      Line 42 - what is "pathological domestication"? It sounds like a contradiction in terms.

      We thank the reviewer for the observation. We have modifies the sentence and provided clearer explanation for the pathological and physiological consequences of ERVs’ env (lines 52-57).

      Lines 166-167 - the authors use the word "classes" but then use a list of terms that correspond to genera within the Retroviridae. The authors should be cautious here, as "class" and "genus" are both official taxonomic terms with different meanings. Do they mean genus? Or, if a more informal term is needed, perhaps "group"?

      Thank you for the observation, the ERVs have been classified into three classes (Class I, II and III) based on the relatedness to the exogenous retroviruses Gammaretrovirus, Betaretrovirus and Spumaretrovirus genera respectively and hence have been mentioned in the manuscript as per the nomenclature proposed by Gifford et al., 2018 which has been cited at Lines 122-125.

      Line 221- "defferent" should be "different"

      Corrected

      Lines 233-234 - what is meant by "canonical" and "non-canonical" forms? Can the authors please define these two terms?

      Thank you for the question, canonical refers to sequences that are well-preserved and match the structural and functional features of complete env genes, and non-canonical refers to sequences with significant structural alterations or truncations that deviate from this typical form. This explanation has been mentioned in the revised version at Lines 475-479.

      Line 252 - if/is

      Corrected

      Lines 274-276 needs a citation to the paper(s) that reported this.

      Corrected

      Line 283-285 - this was confusing. How could the authors have noted distinct occurrences and clusters of these if they were excluded from the BLAST analysis? It says the consensus sequences were effectively representing these, but doesn't this raise the possibility that the consensus sequences are not specific enough? Could this also then lead to false identification? Perhaps a few more words to explain should be added.

      We thank the reviewer for the observation. While performing the tBlastn search we did obtain the hits for HERV15, HERVR, ERVV1, ERVV2 and PABL, and we have mentioned the detailed explanation about this observation in the revised manuscript at lines 619-627.

      Line 298 - missing comma

      Corrected

      Lines 348-351- this list is not a list of recombination mechanisms. Template switching is a mechanism of recombination, but "acquisition" is simply a generic term, "degradation" is not a mechanism, and "cross-species transmission" might be a driver or a result of recombination, but it is not a mechanism of recombination.

      We thank the reviewer for the observation. We have revised the explanation for the recombination events in the discussion section, as some parts of the results have been moved to discussion section (Lines 1058-1065)

      Lines 369-372. It's not clear why this means the event was a "very recent occurrence". Do the authors mean that there were shared integration sites between some of the species, and that these sites lacked the insertions in other species (e.g. gibbon, orangutan, monkeys)?

      For the long section on recombination events involving an env sequence with an LTR in it, can the authors explain how they know when it's a recombination event versus integration of one provirus into another one, followed by recombination between LTRs to generate a solo-LTR?

      We thank the reviewer for the observation. Regarding the very recent occurrence of the recombination event, we have explained it in revised manuscript at lines 769-824 writing “In fact, the recombinant sequences were shared only between 4 species of Catarrhini parvorder and were absent in more distantly related primates (such as gibbons, orangutans, etc.). This with the presence of shared recombination sites suggests that the insertion occurred after the divergence of these species, while its absence in others indicate that it is a recombination event.”

      For the observation regarding the env-LTR recombination events, the recombinants were first detected by the RDP software and were further validated through the BLAT search in the genomes available on genome browser. The explanation on how we obtained these env-LTR recombination events is now provided in lines 746-763 of the revised manuscript.

      Methods Lines 151-168 and Figure 1 legend Lines 689-690 - how did the authors distinguish between "translated regions" corresponding to the actual Env protein sequence from translation of the other two reading frames? That is, there must have been substantial "translatable" stretches of sequence in the two incorrect reading frames as well as the reading frame corresponding to Env, so the question is how were the correct ones identified for the reconstruction?

      We thank the reviewer for the observation. We have provided the detailed explanation to the observation in the methods section (Lines 335-359).

      Line 495 - "previously reported" should include citation(s) of the prior report(s).

      We thank the reviewer for the observation, we have provided appropriate citations.

      Line 525 - the authors propose that the mechanism "is the co-packaging of different ERVs in a virus particle". First, I assume they meant to say that RNA from different ERVs is co-packaged. Second, isn't it also possible or likely that these could arise from co-packaging of exogenous retrovirus RNAs and recombination, especially if the related exogenous forms were still circulating at the time these things arose?

      We thank the reviewer for the observation. We have modified in the revised manuscript a proposed mechanism that includes also the possibility of co-packaging of exogenous retrovirus RNAs and recombination, at lines 1082-1099

      Line 686 - env should either be italicized (gene) or capitalized (protein), depending on what the authors intended here.

      We thank the reviewer for the observation. We have corrected the typological error in the new version of manuscript.

      Reviewer #3 (Public review):

      Summary:

      Retroviruses have been endogenized into the genome of all vertebrate animals. The envelope protein of the virus is not well conserved and acquires many mutations hence can be used to monitor viral evolution. Since they are incorporated into the host genome, they also reflect the evolution of the hosts. In this manuscript the authors have focused their analyses on the env genes of endogenous retroviruses in primates. Important observations made include the extensive recombination events between these retroviruses that were previously unknown and the discovery of HML species in genomes prior to the splitting of old and new world monkeys.

      Strengths:

      They explored a number of databases and made phylogenetic trees to look at the distribution of retroviral species in primates. The authors provide a strong rationale for their study design, they provide a clear description of the techniques and the bioinformatics tools used.

      Weaknesses:

      The manuscript is based on bioinformatics analyses only. The reference genomes do not reflect the polymorphisms in humans or other primate species. The analyses thus likely underestimates the amount of diversity in the retroviruses. Further experimental verification will be needed to confirm the observations.

      Not sure which databases were used, but if not already analyzed, ERVmap.com and repeatmesker are ones that have many ERVs that are not present in the reference genomes. Also, long range sequencing of the human genome has recently become available which may also be worth studying for this purpose.

      We thank the reviewer for the observations and comments. We would like to clarify that the intent of the work was to perform bioinformatics analysis and so a wet lab experimental verification of the observations are out of the scope of the present manuscript. For the aim of the manuscript, we have used the NCBI reference genomes, while for the report of the coordinates of HML supergroup in the squirrel monkey genome and the coordinates of the recombination events through BLAT search we have used genomes assemblies available on Genome browser with repeat masker custom track, since it has well represented ERV annotations.

      The suggestion regarding using long range sequencing of human genome is an interesting perspective and hence in the future work we will try to implement it in our analysis as well as perform an experimental verification, since, again, the focus of the present work does not include wet experimental part.

      Reviewer #3 (Recommendations for the authors):

      In a few places the term HERV has been used when describing ERVs in non-human primates. This needs to be corrected.

      We thank the reviewer for the observation. We have checked and accordingly modified the terms in the manuscript wherever necessary.

    1. Author response:

      eLife Assessment

      This study provides a valuable contribution to understanding how negative affect influences food-choice decision making in bulimia nervosa, using a mechanistic approach with a drift diffusion model (DDM) to examine the weighting of tastiness and healthiness attributes. The solid evidence is supported by a robust crossover design and rigorous statistical methods, although concerns about low trial counts, possible overfitting, and the absence of temporally aligned binge-eating measures limit the strength of causal claims. Addressing modeling transparency, sample size limitations, and the specificity of mood induction effects, would enhance the study's impact and generalizability to broader populations.

      We thank the Editor and Reviewers for their summary of the strengths of our study, and for their thoughtful review and feedback on our manuscript. We apologize for the confusion in how we described the multiple steps performed and hierarchical methods used to ensure that the model we report in the main text was the best fit to the data while not overfitting. We are not certain about what is meant by “[a]ddressing model transparency,” but as described in our response to Reviewer 1 below, we have now more clearly explained (with references) that the use of hierarchical estimation procedures allows for information sharing across participants, which improves the reliability and stability of parameter estimates—even when the number of trials per individual is small. We have clarified for the less familiar reader how our Bayesian model selection criterion penalizes models with more parameters (more complex models). Although details about model diagnostics, recoverability, and posterior predictive checks are all provided in the Supplementary Materials, we have clarified for the less familiar reader how each of these steps ensures that the parameters we estimate are not only identifiable and interpretable, but also ensure that the model can reproduce key patterns in the data, supporting the validity of the model. Additionally, we have provided all scripts for estimating the models by linking to our public Github repository. Furthermore, we have edited language throughout to eliminate any implication of causal claims and acknowledged the limitation of the small sample size.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Using a computational modeling approach based on the drift diffusion model (DDM) introduced by Ratcliff and McKoon in 2008, the article by Shevlin and colleagues investigates whether there are differences between neutral and negative emotional states in:

      (1) The timings of the integration in food choices of the perceived healthiness and tastiness of food options between individuals with bulimia nervosa (BN) and healthy participants.

      (2) The weighting of the perceived healthiness and tastiness of these options.

      Strengths:

      By looking at the mechanistic part of the decision process, the approach has the potential to improve the understanding of pathological food choices. The article is based on secondary research data.

      Weaknesses:

      I have two major concerns and a major improvement point.

      The major concerns deal with the reliability of the results of the DDM (first two sections of the Results, pages 6 and 7), which are central to the manuscript, and the consistency of the results with regards to the identification of mechanisms related to binge eating in BN patients (i.e. last section of the results, page 7).

      (1) Ratcliff and McKoon in 2008 used tasks involving around 1000 trials per participant. The Chen et al. experiment the authors refer to involves around 400 trials per participant. On the other hand, Shevlin and colleagues ask each participant to make two sets of 42 choices with two times fewer participants than in the Chen et al. experiment. Shevlin and colleagues also fit a DDM with additional parameters (e.g. a drift rate that varies according to subjective rating of the options) as compared to the initial version of Ratcliff and McKoon. With regards to the number of parameters estimated in the DDM within each group of participants and each emotional condition, the 5- to 10-fold ratio in the number of trials between the Shevlin and colleagues' experiment and the experiments they refer to (Ratcliff and McKoon, 2008; Chen et al. 2022) raises serious concerns about a potential overfitting of the data by the DDM. This point is not highlighted in the Discussion. Robustness and sensitivity analyses are critical in this case.

      We thank the Reviewer for their thoughtful critique. We agree that a limited number of trials can forestall reliable estimation, which we acknowledge in the Discussion section. However, we used a hierarchical estimation approach which leverages group information to constrain individual-level estimates. This use of group-level parameters to inform individual-level estimates reduces overfitting and noise that can arise when trial counts are low, and the regularization inherent in hierarchical fitting prevents extreme parameter estimates that could arise from noisy or limited data (Rouder & Lu, 2005). As a result, hierarchical estimation has been repeatedly shown to work well in settings with low trial counts, including as few as 40 trials per condition (Ratcliff & Childers, 2015; Wiecki et al., 2013), and previous applications of the time-varying DDM to food choice task data has included experiments with as few as 60 trials per condition (Maier et al., 2020). We have added references to these more recent approaches and specifically note their advantages for the modeling of tasks with fewer trials. Additionally, our successful parameter recovery described in the Supplementary Materials supports the robustness of the estimation procedure and the reliability of our results.

      The authors compare different DDMs to show that the DDM they used to report statistical results in the main text is the best according to the WAIC criterion. This may be viewed as a robustness analysis. However, the other DDM models (i.e. M0, M1, M2 in the supplementary materials) they used to make the comparison have fewer parameters to estimate than the one they used in the main text. Fits are usually expected to follow the rule that the more there are parameters to estimate in a model, the better it fits the data. Additionally, a quick plot of the data in supplementary table S12 (i.e. WAIC as a function of the number of parameters varying by food type in the model - i.e. 0 for M0, 2 for M1, 1 for M2 and 3 for M3) suggests that models M1 and potentially M2 may be also suitable: there is a break in the improvement of WAIC between model M0 and the three other models. I would thus suggest checking how the results reported in the main text differ when using models M1 and M2 instead of M3 (for the taste and health weights when comparing M3 with M1, for τS when comparing M3 with M2). If the differences are important, the results currently reported in the main text are not very reliable.

      We thank the Reviewer for highlighting that it would be helpful for the paper to explicitly note that we specifically selected WAIC as one of two methods to assess model fit because it penalizes for model complexity. We now explicitly state that, in addition to being more robust than other metrics like AIC or BIC when comparing hierarchical Bayesian models like those in the current study, model fit metrics like WAIC penalize for model complexity based on the number of parameters (Watanabe, 2010). Therefore, it is not the case that more complex models (i.e., having additional parameters) would automatically have lower WAICs. Additionally, we note that our second method to assess model fit, posterior predictive checks demonstrate that only model M3 can reproduce key behavioral patterns present in the empirical data. As described in the Supplementary Materials, M1 and M2 miss those patterns in the data. In summary, we used best practices to assess model fit and reliability (Wilson & Collins, 2019): results from the WAIC comparison (which in fact penalizes models with more parameters) and results from posterior predictive checks align in showing that M3 best fit to our data. We have added a sentence to the manuscript to state this explicitly.

      (2) The second main concern deals with the association reported between the DDM parameters and binge eating episodes (i.e. last paragraph of the results section, page 7). The authors claim that the DDM parameters "predict" binge eating episodes (in the Abstract among other places) while the binge eating frequency does not seem to have been collected prospectively. Besides this methodological issue, the interpretation of this association is exaggerated: during the task, BN patients did not make binge-related food choices in the negative emotional state. Therefore, it is impossible to draw clear conclusions about binge eating, as other explanations seem equally plausible. For example, the results the authors report with the DDM may be a marker of a strategy of the patients to cope with food tastiness in order to make restrictive-like food choices. A comparison of the authors' results with restrictive AN patients would be of interest. Moreover, correlating results of a nearly instantaneous behavior (i.e. a couple of minutes to perform the task with the 42 food choices) with an observation made over several months (i.e. binge eating frequency collected over three months) is questionable: the negative emotional state of patients varies across the day without systematically leading patients to engage in a binge eating episode in such states.

      I would suggest in such an experiment to collect the binge craving elicited by each food and the overall binge craving of patients immediately before and after the task. Correlating the DDM results with these ratings would provide more compelling results. Without these data, I would suggest removing the last paragraph of the Results.

      We thank the Reviewer for these interesting suggestions and appreciate the opportunity to clarify that we agree that claims about causal connections between our decision parameters and symptom severity metrics would be inappropriate. Per the Reviewer’s suggestions, we have eliminated the use of the word “predict” to describe the tested association with symptom metrics.  We also agree that more time-locked associations with craving ratings and near-instantaneous behavior would be useful, and we have added this as an important direction for future research in the discussion. However, associating task-based behavior with validated self-report measures that assess symptom severity over long periods of time that precede the task visit (e.g., over the past 2 weeks in depression, over the past month in eating disorders) is common practice in computational psychiatry, psychiatric neuroimaging, and clinical cognitive neuroscience (Hauser et al., 2022; Huys et al., 2021; Wise et al., 2023), and this approach has been used several times specifically with food choice tasks (Dalton et al., 2020; Steinglass et al., 2015). We have revised the language throughout the manuscript to clarify: the results suggest that individuals whose task behavior is more reactive to negative affect tend to be the most symptomatic, but the results do not allow us to determine whether this reactivity causes the symptoms.

      In response to this Reviewer’s important point about negative affect not always producing loss-of-control eating in individuals with BN, we also now explicitly note that while several studies employing ecological momentary assessments (EMA) have repeatedly shown that increases in negative affect significantly increase the likelihood of subsequent loss-of-control eating (Alpers & Tuschen-Caffier, 2001; Berg et al., 2013; Haedt-Matt & Keel, 2011; Hilbert & Tuschen-Caffier, 2007; Smyth et al., 2007), not all loss-of-control eating occurs in the context of negative affect, and that future studies should integrate food choice task data pre and post-affect inductions with measures that capture the specific frequency of loss of control eating episodes that occur during states of high negative affect.

      (3) My major improvement point is to tone down as much as possible any claim of a link with binge eating across the entire manuscript and to focus more on the restrictive behavior of BN patients in between binge eating episodes (see my second major concern about the methods). Additionally, since this article is a secondary research paper and since some of the authors have already used the task with AN patients, if possible I would run the same analyses with AN patients to test whether there are differences between AN (provided they were of the restrictive subtype) and BN.

      We appreciate the Reviewer’s perspective and suggestions. We have adjusted our language linking loss-of-control eating frequency with decision parameters, and we have added additional sentences focusing on the implications for the restrictive behavior of patients with BN between binge eating episodes. In the Supplementary Materials. We have added an analysis of the restraint subscale of the EDE-Q and confirmed no relationship with parameters of interest. While we agree additional analyses with AN patients would be of interest, this is outside the scope of the paper. Our team have collected data from individuals with AN using this task, but not with any affect induction or measure of affect. Therefore, we have added this important direction for future research to the discussion.

      Reviewer #2 (Public review):

      Summary:

      Binge eating is often preceded by heightened negative affect, but the specific processes underlying this link are not well understood. The purpose of this manuscript was to examine whether affect state (neutral or negative mood) impacts food choice decision-making processes that may increase the likelihood of binge eating in individuals with bulimia nervosa (BN). The researchers used a randomized crossover design in women with BN (n=25) and controls (n=21), in which participants underwent a negative or neutral mood induction prior to completing a food-choice task. The researchers found that despite no differences in food choices in the negative and neutral conditions, women with BN demonstrated a stronger bias toward considering the 'tastiness' before the 'healthiness' of the food after the negative mood induction.

      Strengths:

      The topic is important and clinically relevant and methods are sound. The use of computational modeling to understand nuances in decision-making processes and how that might relate to eating disorder symptom severity is a strength of the study.

      Weaknesses:

      The sample size was relatively small and may have been underpowered to find differences in outcomes (i.e., food choice behaviors). Participants were all women with BN, which limits the generalizability of findings to the larger population of individuals who engage in binge eating. It is likely that the negative affect manipulation was weak and may not have been potent enough to change behavior. Moreover, it is unclear how long the negative affect persisted during the actual task. It is possible that any increases in negative affect would have dissipated by the time participants were engaged in the decision-making task.

      We thank the Reviewer for their comments on the strengths of the paper, and for highlighting these important considerations regarding the sample demographics and the negative affect induction. As in the original paper that focused only on ultimate food choice behaviors, we now specifically acknowledge that the study was only powered to detect small to medium group differences in the effect of negative emotion on these final choice behaviors. Regarding the sample demographics, we agree that the study’s inclusion of only female participants is a limitation.  Although the original decision for this sampling strategy was informed by data suggesting that bulimia nervosa is roughly six times more prevalent among females than males (Udo & Grilo, 2018), we now note in the discussion that our female-only sample limits the generalizability of the findings.

      We also agree with the Reviewer’s noted limitations of the negative mood induction, and based on the reviewer’s suggestions, we have added to our original description of these limitations in the Discussion. Specifically, we now note that although the task was completed immediately after the affect induction, the study did not include intermittent mood assessments throughout the choice task, so it is unclear how long the negative affect persisted during the actual task.

      Reviewer #3 (Public review):

      Summary:

      The study uses the food choice task, a well-established method in eating disorder research, particularly in anorexia nervosa. However, it introduces a novel analytical approach - the diffusion decision model - to deconstruct food choices and assess the influence of negative affect on how and when tastiness and healthiness are considered in decision-making among individuals with bulimia nervosa and healthy controls.

      Strengths:

      The introduction provides a comprehensive review of the literature, and the study design appears robust. It incorporates separate sessions for neutral and negative affect conditions and counterbalances tastiness and healthiness ratings. The statistical methods are rigorous, employing multiple testing corrections.

      A key finding - that negative affect induction biases individuals with bulimia nervosa toward prioritizing tastiness over healthiness - offers an intriguing perspective on how negative affect may drive binge eating behaviors.

      Weaknesses:

      A notable limitation is the absence of a sample size calculation, which, combined with the relatively small sample, may have contributed to null findings. Additionally, while the affect induction method is validated, it is less effective than alternatives such as image or film-based stimuli (Dana et al., 2020), potentially influencing the results.

      We agree that the small sample size and specific affect induction method may have contributed to the null model-agnostic behavioral findings. Based on this Reviewer’s and Reviewer 2’s comments, we have added these factors to our original acknowledgements of limitations in the Discussion.

      Another concern is the lack of clarity regarding which specific negative emotions were elicited. This is crucial, as research suggests that certain emotions, such as guilt, are more strongly linked to binge eating than others. Furthermore, recent studies indicate that negative affect can lead to both restriction and binge eating, depending on factors like negative urgency and craving (Leenaerts et al., 2023; Wonderlich et al., 2024). The study does not address this, though it could explain why, despite the observed bias toward tastiness, negative affect did not significantly impact food choices.

      We thank the Reviewer for raising these important points and possibilities. In the supplementary materials, we have added an additional analysis of the specific POMS subscales that comprise the total negative affect calculation that was reported in the original paper (Gianini et al., 2019), and which we now report in the main text. Ultimately, we found that, across both groups, the negative affect induction increased responses related to anger, confusion, depression, and tension while reducing vigor.

      We agree with the Reviewer that factors like negative urgency and cravings are relevant here. The study did not collect any measures of craving, and in response to Reviewer 1 and this Reviewer, we now note in the discussion that replication studies including momentary craving assessments will be important. While we don’t have any measurements of cravings, we did measure negative urgency. Despite these prior findings, the original paper (Gianini et al., 2019) did not find that negative urgency was related to restrictive food choices. We have now repeated those analyses, and we also were unable to find any meaningful patterns. Nonetheless, we have added an analysis of negative urgency scores and decision parameters to the supplementary materials.      

      References

      Alpers, G. W., & Tuschen-Caffier, B. (2001). Negative feelings and the desire to eat in bulimia nervosa. Eating Behaviors, 2(4), 339–352. https://doi.org/10.1016/S1471-0153(01)00040-X

      Berg, K. C., Crosby, R. D., Cao, L., Peterson, C. B., Engel, S. G., Mitchell, J. E., & Wonderlich, S. A. (2013). Facets of negative affect prior to and following binge-only, purge-only, and binge/purge events in women with bulimia nervosa. Journal of Abnormal Psychology, 122(1), 111–118. https://doi.org/10.1037/a0029703

      Dalton, B., Foerde, K., Bartholdy, S., McClelland, J., Kekic, M., Grycuk, L., Campbell, I. C., Schmidt, U., & Steinglass, J. E. (2020). The effect of repetitive transcranial magnetic stimulation on food choice-related self-control in patients with severe, enduring anorexia nervosa. International Journal of Eating Disorders, 53(8), 1326–1336. https://doi.org/10.1002/eat.23267

      Gianini, L., Foerde, K., Walsh, B. T., Riegel, M., Broft, A., & Steinglass, J. E. (2019). Negative affect, dietary restriction, and food choice in bulimia nervosa. Eating Behaviors, 33, 49–54. https://doi.org/10.1016/j.eatbeh.2019.03.003

      Haedt-Matt, A. A., & Keel, P. K. (2011). Revisiting the affect regulation model of binge eating: A meta-analysis of studies using ecological momentary assessment. Psychological Bulletin, 137(4), 660–681. https://doi.org/10.1037/a0023660

      Hauser, T. U., Skvortsova, V., Choudhury, M. D., & Koutsouleris, N. (2022). The promise of a model-based psychiatry: Building computational models of mental ill health. The Lancet Digital Health, 4(11), e816–e828. https://doi.org/10.1016/S2589-7500(22)00152-2

      Hilbert, A., & Tuschen-Caffier, B. (2007). Maintenance of binge eating through negative mood: A naturalistic comparison of binge eating disorder and bulimia nervosa. International Journal of Eating Disorders, 40(6), 521–530. https://doi.org/10.1002/eat.20401

      Huys, Q. J. M., Browning, M., Paulus, M. P., & Frank, M. J. (2021). Advances in the computational understanding of mental illness. Neuropsychopharmacology, 46(1), 3–19. https://doi.org/10.1038/s41386-020-0746-4

      Maier, S. U., Raja Beharelle, A., Polanía, R., Ruff, C. C., & Hare, T. A. (2020). Dissociable mechanisms govern when and how strongly reward attributes affect decisions. Nature Human Behaviour, 4(9), Article 9. https://doi.org/10.1038/s41562-020-0893-y

      Ratcliff, R., & Childers, R. (2015). Individual differences and fitting methods for the two-choice diffusion model of decision making. Decision, 2(4), 237–279. https://doi.org/10.1037/dec0000030

      Rouder, J. N., & Lu, J. (2005). An introduction to Bayesian hierarchical models with an application in the theory of signal detection. Psychonomic Bulletin & Review, 12(4), 573–604. https://doi.org/10.3758/BF03196750

      Smyth, J. M., Wonderlich, S. A., Heron, K. E., Sliwinski, M. J., Crosby, R. D., Mitchell, J. E., & Engel, S. G. (2007). Daily and momentary mood and stress are associated with binge eating and vomiting in bulimia nervosa patients in the natural environment. Journal of Consulting and Clinical Psychology, 75(4), 629–638. https://doi.org/10.1037/0022-006X.75.4.629

      Steinglass, J., Foerde, K., Kostro, K., Shohamy, D., & Walsh, B. T. (2015). Restrictive food intake as a choice—A paradigm for study. International Journal of Eating Disorders, 48(1), 59–66. https://doi.org/10.1002/eat.22345

      Udo, T., & Grilo, C. M. (2018). Prevalence and Correlates of DSM-5–Defined Eating Disorders in a Nationally Representative Sample of U.S. Adults. Biological Psychiatry, 84(5), 345–354. https://doi.org/10.1016/j.biopsych.2018.03.014

      Watanabe, S. (2010). Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory. Journal of Machine Learning Research, 11, 3571–3594.

      Wiecki, T. V., Sofer, I., & Frank, M. J. (2013). HDDM: Hierarchical Bayesian estimation of the drift-diffusion model in Python. Frontiers in Neuroinformatics, 7. https://doi.org/10.3389/fninf.2013.00014

      Wilson, R. C., & Collins, A. G. (2019). Ten simple rules for the computational modeling of behavioral data. eLife, 8, e49547. https://doi.org/10.7554/eLife.49547

      Wise, T., Robinson, O. J., & Gillan, C. M. (2023). Identifying Transdiagnostic Mechanisms in Mental Health Using Computational Factor Modeling. Biological Psychiatry, 93(8), 690–703. https://doi.org/10.1016/j.biopsych.2022.09.034

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study has preliminarily revealed the role of ACVR2A in trophoblast cell function, including its effects on migration, invasion, proliferation, and clonal formation, as well as its downstream signaling pathways.

      Strengths:

      The use of multiple experimental techniques, such as CRISPR/Cas9-mediated gene knockout, RNA-seq, and functional assays (e.g., Transwell, colony formation, and scratch assays), is commendable and demonstrates the authors' effort to elucidate the molecular mechanisms underlying ACVR2A's regulation of trophoblast function. The RNA-seq analysis and subsequent GSEA findings offer valuable insights into the pathways affected by ACVR2A knockout, particularly the Wnt and TCF7/c-JUN signaling pathways.

      Weaknesses:

      The molecular mechanisms underlying this study require further exploration through additional experiments. While the current findings provide valuable insights into the role of ACVR2A in trophoblast cell function and its involvement in the regulation of migration, invasion, and proliferation, further validation in both in vitro and in vivo models is needed. Additionally, more experiments are required to establish the functional relevance of the TCF7/c-JUN pathway and its clinical significance, particularly in relation to pre-eclampsia. Additional techniques, such as animal models and more advanced clinical sample analyses, would help strengthen the conclusions and provide a more comprehensive understanding of the molecular pathways involved.

      Reviewer #2 (Public review):

      Summary:

      ACVR2A is one of a handful of genes for which significant correlations between associated SNPs and the incidences of preeclampsia have been found in multiple populations. It is one of the TGFB family receptors, and multiple ligands of ACVR2A, as well as its coreceptors and related inhibitors, have been implicated in placental development, trophoblast invasion, and embryo implantation. This useful study builds on this knowledge by showing that ACVR2A knockout in trophoblast-related cell lines reduces trophoblast invasion, which could tie together many of these observations. Support for this finding is incomplete, as reduced proliferation may be influencing the invasion results. The implication of cross-talk between the WNT and ACRV2A/SMAD2 pathways is an important contribution to the understanding of the regulation of trophoblast function.

      Strengths:

      (1) ACVR2A is one of very few genes implicated in preeclampsia in multiple human populations, yet its role in pathogenesis is not very well studied and this study begins to address that hole in our knowledge.

      (2) ACVR2A is also indirectly implicated in trophoblast invasion and trophoblast development via its connections to many ligands, inhibitors, and coreceptors, suggesting its potential importance.

      (3) The authors have used multiple cell lines to verify their most important observations.

      Weaknesses:

      (1) There are a number of claims made in the introduction without attribution. For example, there are no citations for the claims that family history is a significant risk factor for PE, that inadequate trophoblast invasion of spiral arteries is a key factor, and that immune responses, and reninangiotensin activity are involved.

      Thank you for pointing out the lack of citations in some parts of the introduction. We have revised the manuscript to include appropriate references for the claims regarding family history as a risk factor for PE, the role of inadequate trophoblast invasion in spiral arteries, and the involvement of immune responses and the renin-angiotensin system. The revised text now includes citations to well-established studies in the field (Salonen Ros et al., 2000; Chappell LC et al., 2021; Brosens et al., 2002; Knofler et al., 2019; Redman CWG et al., 1999; LaMarca B et al., 2008). We believe these additions improve the scientific rigor of the manuscript.

      (2) The introduction states "As a receptor for activin A, ACVR2A..." It's important to acknowledge that ACVR2A is also the receptor for other TGFB family members, with varying affinities and coreceptors. Several TGFB family members are known to regulate trophoblast differentiation and invasion. For example, BMP2 likely stimulates trophoblast invasion at least in part via ACVR2A (PMID 29846546).

      Thank you for highlighting the broader role of ACVR2A as a receptor for multiple members of the TGF-β superfamily. We have revised the introduction to acknowledge that ACVR2A is not only the receptor for activin A but also interacts with other ligands, such as BMP2, which likely stimulates trophoblast invasion via ACVR2A (PMID: 29846546). This addition provides a more comprehensive view of ACVR2A's function in trophoblast biology. While the focus of our current study is on activin A, we agree that ACVR2A's role in mediating the effects of other TGF-β family members is an important topic for future research.

      (3) An alternative hypothesis for the potential role of ACVR2A in preeclampsia is its functions in the endometrium. In the mouse ACVR2A knockout in the uterus (and other progesterone receptorexpressing cells) leads to embryo implantation failure.

      Thank you for bringing up the potential role of ACVR2A in the endometrium as an alternative hypothesis. We have revised the discussion to acknowledge this possibility and cited relevant studies showing that uterine-specific knockout of ACVR2A in mice leads to embryo implantation failure (Monsivais et al., 2021). This suggests that ACVR2A may play a critical role in uterine receptivity and embryo implantation, which could influence placental development and preeclampsia pathogenesis. While our current study focuses on trophoblast-related functions of ACVR2A, we agree that investigating its role in the uterine environment is an important direction for future research.

      (4) In the description of the patient population for placental sample collections, preeclampsia is defined only by hypertension, and this is described as being in accordance with ACOG guidelines. ACOG requires a finding of hypertension in combination with either proteinuria or one of the following: thrombocytopenia, elevated creatinine, elevated liver enzymes, pulmonary, edema, and new onset unresponsive headache.

      We appreciate the reviewer’s detailed observation regarding the definition of preeclampsia.

      We have reviewed and clarified our description of the diagnostic criteria based on the American College of Obstetricians and Gynecologists (ACOG) guidelines. Specifically, we have revised the definition in the Materials and Methods section under "Collection of Placenta and Decidua Specimens," as follows: In accordance with the guidelines from the American College of Obstetricians and Gynecologists (ACOG, 2023), preeclampsia (PE) is diagnosed as hypertension (systolic blood pressure ≥140 mmHg or diastolic blood pressure ≥90 mmHg on at least two occasions) in combination with one or more of the following: proteinuria (≥300 mg/24-hour urine collection or protein/creatinine ratio ≥0.3), thrombocytopenia, elevated serum creatinine, elevated liver enzymes, pulmonary edema, or new-onset headache unresponsive to treatment.

      (5) I believe that Figures 1a and 1b are data from a previously published RNAseq dataset, though it is not entirely clear in the text. The methods section does not include a description of the analysis of these data undertaken here. It would be helpful to include at least a brief description of the study these data are taken from - how many samples, how were the PE/control groups defined, gestational age range, where is it from, etc. For the heatmap presented in B, what is the significance of the other genes/ why are they being shown? If the purpose of these two panels is to show differential expression specifically of ACVR2A in this dataset, that could be shown more directly.

      Clarification of RNAseq dataset: The Methods section has been revised to specify the dataset source (GEO accession number: GSE114691), which includes 20 PE and 21 control placental samples with gestational ages ranging from 34 to 38 weeks. PE and control groups were defined using clinical criteria such as hypertension and proteinuria, and these details have also been added to the Results section. RNAseq analysis description: We have included details of the differential gene expression analysis in the Methods section. Specifically, the DESeq2 R package was used, with thresholds of FDR < 0.05 and |log2(fold change) | ≥ 1. The selection of WNT pathwayrelated genes in Figure 1B is based on these analyses. Significance of the heatmap genes: The genes displayed in Figure 1B were selected based on their significant differential expression and enrichment in pathways relevant to PE pathogenesis, such as the WNT signaling pathway. We have clarified this in the Results section and updated the figure legend to explain their biological relevance. Purpose of Figures 1A and 1B: Figure 1A emphasizes the downregulation of ACVR2A in PE placentas, while Figure 1B complements this by presenting differentially expressed genes associated with the WNT pathway. These figures collectively highlight the role of ACVR2A in PE and its connection to broader molecular pathways. Text descriptions have been updated to improve clarity and focus.

      (6) More information is needed in the methods section to understand how the immunohistochemistry was quantified. "Quantitation was performed" is all that is provided. Was staining quantified across the whole image or only in anchoring villous areas? How were HRP & hematoxylin signals distinguished in ImageJ? How was the overall level of HRP/DAB development kept constant between the NC and PE groups?

      Thank you for pointing out the need for more details regarding the quantification of immunohistochemistry (IHC). We have now clarified and expanded the description of the IHC quantification process in the Methods section as follows: Quantification Across the Entire Section: IHC staining was assessed across the entire tissue section to account for global expression patterns. For quantitative analysis, representative regions from the anchoring villous areas, where ACVR2A expression is most prominent, were selected for comparison between NC and PE groups. This ensured that the analysis focused on biologically relevant regions. ImageJ Analysis:

      Images of stained sections were captured under identical magnifications and lighting conditions. Hematoxylin (blue, nuclear staining) and DAB/HRP (brown, protein-specific signal) were distinguished using ImageJ's color deconvolution plugin. The DAB/HRP signal was isolated and quantified based on the integrated optical density (IOD) within the selected regions. Consistency in HRP/DAB Development: To maintain consistency between NC and PE groups, all tissue samples were processed under identical experimental conditions, including the same antibody dilution, incubation times, and DAB/HRP development durations. Negative controls (without primary antibody) were included to monitor background staining, and the DAB reaction was stopped simultaneously across all samples to avoid overdevelopment. Statistical Analysis: The quantified DAB signal intensity was normalized to the area of the selected regions, and comparisons between NC and PE groups were performed using statistical tests (e.g., Student’s ttest). Results are reported as mean ± SD. We hope this additional detail addresses your concerns.

      (7) In Figure 1E it is not immediately obvious to many readers where the EVT are. It is probably worth circling or putting an arrow to the little region of ACVR2A+ EVT that is shown in the higher magnification image in Figure 1E. These are actually easier to see in the pictures provided in the supplement Figure 1. Of note, the STB is also staining positive. This is worth pointing out in the results text.

      Thank you for your suggestion regarding Figure 1E. To make the location of the ACVR2A+ extravillous trophoblasts (EVTs) more apparent, we have updated Figure 1E by adding arrows to indicate the regions of EVTs in the higher magnification image. Additionally, we have included annotations in the supplemental Figure S1 to further aid visualization. We appreciate your observation that syncytiotrophoblasts (STBs) also show positive staining for ACVR2A. We have revised the Results section to explicitly mention this finding and its potential significance.

      (8) It is not possible to judge whether the IF images in 1F actually depict anchoring villi. The DAPI is really faint, and it's high magnification, so there isn't a lot of context. Would it be possible to include a lower magnification image that shows where these cells are located within a placental section? It is also somewhat surprising that this receptor is expressed in the cytoplasm rather than at the cell surface. How do the authors explain this?

      Thank you for your suggestion to provide more context for the immunofluorescence (IF) images in Figure 1F. To address this, we have included lower magnification images in Supplementary Figure S2, showing the overall structure of the placental section and the location of the anchoring villi. These images help to contextualize the regions analyzed in Figure 1F, which were selected to clearly illustrate ACVR2A expression in extravillous trophoblasts (EVTs). In Figure 1F, we have focused on higher magnification images for better visualization of ACVR2A staining patterns in EVTs. Regarding the subcellular localization of ACVR2A, the receptor is predominantly expressed on the cell surface, as shown in our images. However, some intracellular staining is also observed, which may reflect receptor trafficking or recycling processes, consistent with the behavior of other activin receptors under physiological or pathological conditions. We have clarified these points in the Results and Discussion sections.

      (9) The results text makes it sound like the data in Figure 2A are from NCBI & Protein atlas, but the legend says it is qPCR from this lab. The methods do not detail how these various cell lines were grown; only HTR-SVNeo cell culture is described. Similarly, JAR cells are used for several experiments and their culture is not described.

      Thank you for pointing out the need for clarification regarding Figure 2A and cell culture methods. The data in Figure 2A were generated using RT-qPCR conducted in our laboratory, not solely based on data from NCBI or the Human Protein Atlas. We have revised the Results section to reflect this more accurately. Regarding the culture conditions, we acknowledge that the methods for other cell lines were not explicitly detailed. For this study, all cell lines, including JAR and other cancer cell lines, were cultured following standard protocols provided by the suppliers. Specifically, JAR cells and other cell lines were purchased from Wuhan Punosei Life Technology and were maintained in RPMI-1640 medium supplemented with 10% fetal bovine serum (FBS) and 1% penicillin/streptomycin under standard conditions (37°C, 5% CO<sub>2</sub>). This information has been added to the Methods section for clarity.

      (10) Under RT-qPCR methods, the phrase "cDNA reverse transcription cell RNA was isolated..." does not make any sense.

      Thank you for pointing out the unclear phrasing in the RT-qPCR methods section. We agree that the original description was not precise. To address this, we have revised the relevant section to improve clarity and accuracy. Specifically, the methods now explicitly describe the two key steps: RNA isolation and cDNA synthesis. The revised text reads: Total RNA was isolated from cells using a Total RNA Extraction Kit (TIANGEN, China) following the manufacturer’s instructions. The extracted RNA was reverse-transcribed into complementary DNA (cDNA) using a cDNA Synthesis Kit (Takara, Japan) according to the protocol provided by the manufacturer.

      (11) The paragraph beginning "Consequently, a potential association..." is quite confusing. It mentions analyzing ACVR2A expression in placentas, but then doesn't point to any results of this kind and repeats describing the results in Figure 2a, from various cell lines.

      Thank you for your comment regarding the paragraph beginning with "Consequently, a potential association...". We understand that the current wording may create confusion. The primary aim of this section is to compare ACVR2A expression levels across various cell lines, including trophoblast-derived and non-trophoblast cell lines, to highlight the relevance of ACVR2A in trophoblast function, particularly in invasion and migration. To address your concerns, we have revised the paragraph for clarity and logical flow. The updated text explicitly focuses on the comparison of ACVR2A expression across cell lines (Figure 2A) and how this supports the hypothesis that ACVR2A plays a key role in trophoblast invasion and migration. Additionally, the discussion of placental samples has been separated to avoid confusion with cell line results. We hope this revision resolves the issue.

      (12) The authors should acknowledge that the effect of the ACVR2A knockout on proliferation makes it difficult to draw any conclusions from the trophoblast invasion assays. That is, there might be fewer migrating or invading cells in the knockout lines because there are fewer cells, not because the cells that are there are less invasive. Since this is a central conclusion of the study, it is a major drawback.

      Thank you for highlighting this important point. We agree that the reduced proliferation observed in ACVR2A knockout cells could influence the results of the invasion assays, as fewer cells may inherently lead to reduced invasion. To minimize this effect, we conducted the invasion and migration assays under low-serum conditions (1–2% serum) to limit cell proliferation during the experimental timeframe. This approach was based on optimization trials and existing literature, as serum-free conditions were found to negatively impact cell viability and experimental reproducibility. While these efforts helped to mitigate the impact of proliferation on the results, we acknowledge this as a limitation of our study and have added this discussion to the manuscript. Future studies could incorporate approaches such as normalizing cell numbers or using additional proliferation-independent methods to confirm the findings. We hope this clarification and the steps taken address your concerns.

      (13) The legend and the methods section do not agree on how many fields were selected for counting in the transwell invasion assays in Figure 3C. The methods section and the graph do not match the number of replicate experiments in Figure 3D (the number of replicate experiments isn't described for 3C).

      Thank you for pointing out the inconsistencies regarding the number of fields counted and the number of replicates in the Transwell invasion assays (Figure 3C) and colony formation assays (Figure 3D). We apologize for the lack of clarity in the Methods section and figure legend. To address this, we have revised both the figure legends and the Methods section for consistency and added detailed descriptions. For Figure 3C, cell invasion was quantified by randomly selecting 5 fields of view per sample under 300× magnification. Images shown in the figure were taken at lower magnification to provide a better visual comparison between experimental and control groups. For Figure 3D, each experiment was independently repeated at least 10 times to ensure robust and reproducible results. These clarifications have been incorporated into the revised manuscript. We appreciate your feedback and believe this revision improves the clarity and transparency of our methods.

      (14) Discussion says "Transcriptome sequencing analysis revealed low ACVR2A expression in placental samples from PE patients, consistent with GWAS results across diverse populations." The authors should explain this briefly. Why would SNPs in ACVR2A necessarily affect levels of the transcript?

      Thank you for raising this important point. We acknowledge that our study did not directly investigate how SNPs in the ACVR2A gene affect transcript levels. However, prior studies have suggested that SNPs can influence gene expression through various mechanisms. For example, SNPs in regulatory regions (e.g., promoters, enhancers, or untranslated regions) may affect transcription factor binding, RNA stability, or splicing efficiency, ultimately altering transcript levels. While we did not directly assess the functional consequences of ACVR2A SNPs in this study, the observed downregulation of ACVR2A in PE placentas aligns with the potential regulatory impact of SNPs previously identified in GWAS studies. To address this, we have revised the Discussion section to clarify the relationship between SNPs and transcript levels and acknowledge this limitation.  

      (15) "The expression levels of ACVR2A mRNA were comparable to those of tumor cells such as A549. This discovery suggested a potential pivotal role of ACVR2A in the biological functions of trophoblast cells, especially in the nurturing layer." Alternatively, ACVR2A expression resembles that of tumors because the cell lines used here are tumor cells (JAR) or immortalized cells (HTR8). These lines are widely used to study trophoblast properties, but the discussion should at least acknowledge the possibility that the behavior of these cells does not always resemble normal trophoblasts.

      Thank you for pointing out this important limitation. We agree that the JAR and HTR8/SVneo cell lines, being tumor-derived or immortalized, may not fully replicate the behavior of normal trophoblast cells. While these cell lines are widely used as models for studying trophoblast properties due to their ease of culture and invasive behavior, their gene expression and signaling pathways could partially reflect their tumorigenic or immortalized origins. We have revised the Discussion section to acknowledge this limitation and clarify the interpretation of ACVR2A expression levels in these cells.

      (16) The authors should discuss some of what is known about the relationship between the TCF7/c-JUN pathway and the major signaling pathway activated by ACVR2A, Smad 2/3/4. The Wnt and TGFB family cross-talk is quite complex and it has been studied in other systems.

      Thank you for highlighting the relationship between the TCF7/c-JUN pathway and Smad2/3/4 signaling. In our study, we chose to focus on Smad1/5 due to its strong association with ACVR2A in placental development, as demonstrated in a recent study(DOI: 10.1038/s41467-021-23571-5). This study showed that the BMP signaling pathway, mediated through ACVR2A-Smad1/5, is essential for endometrial receptivity and embryo implantation. While Smad2/3/4 are wellestablished mediators of TGF-β signaling, Smad1/5 activation is more directly linked to ACVR2A in the context of reproductive biology.

      In PE placentas, we observed a significant downregulation of Smad1/5 expression, which supports the hypothesis that ACVR2A-mediated Smad signaling is disrupted in this condition. Although we did not directly assess Smad2/3/4 in this study, prior research has shown that Smad2/3 can interact with TCF/LEF transcription factors to regulate Wnt-related target genes, suggesting potential cross-talk between these pathways. We have now clarified this rationale and included a discussion of these interactions in the revised manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Several points need to be addressed to improve the clarity and robustness of the presented findings:

      (1) From a clinical perspective, several concerns arise regarding the interpretation of these findings. First, the small sample size of 20 patients may not be representative of the broader population, limiting the generalizability of the results. Additionally, although no significant differences in age and pre-pregnancy BMI were observed between the PE and normal control groups, other clinical variables, such as hypertension or gestational diabetes, may also influence ACVR2A expression and contribute to PE development. Furthermore, while the study suggests a correlation between reduced ACVR2A expression and PE, it remains unclear whether this association holds true across different subtypes of PE or whether there are other underlying clinical factors that could account for these changes in gene expression. These factors need to be considered in future studies to better understand the clinical relevance of ACVR2A in PE.

      Thank you for raising these insightful concerns about the clinical interpretation of our findings. We agree that the small sample size of 20 patients may limit the generalizability of our results. To address this, we are actively expanding our cohort by collecting additional clinical samples from PE patients and normotensive controls. This effort aims to strengthen the robustness of our findings and provide stronger evidence for the role of ACVR2A in PE. We would also like to clarify that, during the initial sample collection, we specifically included only PE patients without comorbidities such as gestational diabetes, chronic hypertension, or other pregnancy-related complications. This strict selection criterion was implemented to minimize the potential influence of confounding clinical variables and ensure that our findings specifically reflect the association between ACVR2A expression and PE. While our study provides important initial insights, we recognize the need for larger-scale studies to validate these findings. The ongoing collection of clinical samples will allow us to address this limitation and enhance the translational relevance of our research. We have revised the manuscript to reflect these points and highlight our plans to strengthen the study by increasing the sample size.

      (2) The section "Precision Genome Surgery: ACVR2A Knockout via CRISPR/Cas9" in the results contains some issues with expression details. The results section should be more structured, with data presented in a more detailed and clear manner, ensuring that there is a clear connection between each experimental step and its corresponding result. For example, the sentence "Following multiple rounds of monoclonal culture, genotype identification, RT-qPCR and Western blotting (WB) analysis for screening, specific double-knockout monoclonal cell lines were distinctly chosen" contains redundant phrasing and unnecessary details, which affect the flow of the text.

      Thank you for your constructive feedback on the “Precision Genome Surgery: ACVR2A Knockout via CRISPR/Cas9” section. We agree that this section can be better structured to present the data in a more detailed and coherent manner. To address this, we have reorganized the results into distinct steps, ensuring a clear connection between each experimental step and its corresponding result. Redundant phrasing has been removed to improve the flow and readability of the text. The revised section emphasizes the purpose of each step, the screening process, and the specific results obtained.

      (3) The figure legends and panel labels in Figure 3 should be revised to ensure clarity and consistency. The figure legend should specify the exact panels (e.g., Figure 3A, 3B, 3C, etc.) and clearly describe the experimental conditions and results shown in each part.

      Thank you for pointing out the need for improved clarity and consistency in the figure legends and panel labels for Figure 3. We have revised the figure legend to specify each panel (e.g., Figure 3A, 3B, 3C, etc.) and included detailed descriptions of the experimental conditions and results displayed in each part. These updates aim to ensure better understanding and alignment between the figure legend and the panels.

      (4) Lack of In Vivo Validation of ACVR2A Knockout: The study does not include in vivo experiments to validate the effects of ACVR2A knockout. It would be important to investigate whether similar regulatory effects of ACVR2A on trophoblast cell migration and invasion can be observed in animal models or in larger clinical studies. The lack of in vivo data raises questions about the translational relevance of the findings.

      Thank you for highlighting the importance of in vivo validation to assess the translational relevance of our findings. While we acknowledge that in vivo experiments could provide additional insights into the role of ACVR2A in trophoblast migration and invasion, this study was primarily designed as an in vitro investigation to explore the molecular mechanisms underlying ACVR2A function in trophoblast cells. The choice of an in vitro model allowed us to perform precise and controlled mechanistic analyses, which are critical for establishing a foundation for future research. We agree that in vivo studies using animal models or larger clinical cohorts are important next steps to validate the regulatory effects of ACVR2A on trophoblast function and its contribution to PE pathogenesis. These directions will be pursued in future research to further establish the translational potential of our findings. We have included this perspective in the revised Discussion section.

      (5) TCF7/c-JUN Pathway in Clinical Samples: In the study of the TCF7/c-JUN pathway, the authors mention assessing protein expression in clinical samples through immunohistochemistry (IHC). However, the manuscript does not provide a clear explanation of how the findings from laboratory cell models (such as HTR8/SVneo and JAR) relate to the clinical samples. Specifically, while ACVR2A knockout is shown to affect these proteins at the cellular level, it is unclear whether this effect is observed in clinical samples. Therefore, further validation of the TCF7/c-JUN pathway in the cell models and exploration of its relationship with protein expression in clinical samples is necessary. Additional experiments, such as immunofluorescence staining or mass spectrometry, could further confirm the role of the TCF7/c-JUN pathway in cells and provide a more direct comparison with clinical data.

      Thank you for highlighting the need to connect findings from cell models to clinical samples, particularly with respect to the TCF7/c-JUN pathway. In response to your comment, we conducted additional experiments using Western blot analysis to evaluate the expression of ACVR2A, SMAD1/5, SMAD4, pSMAD1/5/9, and TCF7L1/TCF7L2 in PE placental tissues compared to normotensive controls (Figure 7A). The results demonstrated significantly reduced expression of these proteins in PE placentas, providing evidence that disruptions in the ACVR2A-SMAD and TCF7/c-JUN signaling pathways observed in vitro are also present in clinical samples.

      These findings strengthen the translational relevance of our study by directly linking the molecular mechanisms identified in cell models to clinical observations. We have updated the Results and Discussion sections to incorporate these new data, and we believe this addition addresses your concern about the relationship between in vitro and clinical findings.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #2 (Public review):

      The authors have constructively responded to previous referee comments and I believe that the manuscript is a useful addition to the literature. I particularly appreciate the quantitative approach to social behavior, but have two cautionary comments.

      (1) Conceptually it is important to further justify why this particular maximum entropy model is appropriate. Maximum entropy models have been applied across a dizzying array of biological systems, including genes, neurons, the immune system, as well as animal behavior, so would seem quite beneficial to explain the particular benefits here, for mouse social behavior as coarse-grained through the eco-hab chamber occupancy. This would be an excellent chance to amplify what the models can offer for biological understanding, particularly in the realm of social behavior

      We thank the reviewer for this comment. Maximum entropy models, along with other statistical inference methods that learn interaction patterns from simultaneously-measured degrees of freedom, help distinguish various types of interactions, e.g. direct vs. indirect interactions among animals, individual preference to food vs. social interaction with pairs. As research on social behavior expands from focusing on pairs of animals to studying groups in (semi-)naturalistic environments, maximum entropy models serve as a crucial link between high-throughput data and the need to identify and distinguish interaction rules. Specifically, among all possible maximum entropy models, the pairwise maximum entropy model is one of the simplest that can describe interactions among individuals, which serves as an excellent starting point to understand collective and social behavior in animals.

      Although the Eco-HAB setup currently records spatially coarse-grained data, it still provides more spatial information compared to the traditional three-chamber tests used to assess sociability for rodents. By showing that the maximum entropy model can effectively analyze Eco-HAB data, we hope to highlight its potential in research of social behavior in animals.

      To amplify what the models can offer for biological understanding particularly in the realm of social behavior, We have updated the Introduction to add a more logical structure to the need of using maximum entropy models to identify interactions among mice. Additionally, we updated the first paragraph of the Discussion to make it specific that it is the use of maximum entropy models that identifies interaction patterns from the high-throughput data. Finally, we have also added in the Discussion (line 422-425) arguments supporting the specific use of pairwise maximum entropy models to study social behaviors.

      (2) Maximum entropy models of even intermediate size systems involve a large number of parameters. The authors are transparent about that limitation here, but I still worry that the conclusion of the sufficiency of pairwise interactions is simply not general, and this may also relate to the differences from previous work. If, as the authors suggest in the discussion, this difference is one of a choice of variables, then that point could be emphasized. The suggestion of a follow up study with a smaller number of mice is excellent.

      We thank the reviewer for raising the issue and agree that the caveat of how general pairwise interactions can describe social behavior of animals needs to be discussed. We have added a sentence in the Discussion to point out this important caveat. “More generally, this discrepancy when looking at different choices of variables raises the issue that when studying social behavior of animals in a group, it is important to test and compare interaction models with different complexity (e.g. pairwise or with higher-order interactions).” We have also toned down our conclusion to limit our results of pairwise interactions describing mice co-localization patterns to the data collected in Eco-HAB (also see Reviewer 3 Major Point 2).

      Reviewer #3 (Public review):

      Summary:

      Chen et al. present a thorough statistical analysis of social interactions, more precisely, co-occupying the same chamber in the Eco-HAB measurement system. They also test the effect of manipulating the prelimbic cortex by using TIMP-1 that inhibits the MMP-9 matrix metalloproteinase. They conclude that altering neural plasticity in the prelimbic cortex does not eliminate social interactions, but it strongly impacts social information transmission.

      Strengths:

      The quantitative approach to analyzing social interactions is laudable and the study is interesting. It demonstrates that the Eco-HAB can be used for high throughput, standardized and automated tests of the effects of brain manipulations on social structure in large groups of mice.

      Weaknesses:

      A demonstration of TIMP-1 impairing neural plasticity specifically in the prelimbic cortex of the treated animals would greatly strengthen the biological conclusions. The Eco-HAB provides coarser spatial information compared to some other approaches, which may influence the conclusions.

      Recommendations for the authors:  

      Reviewer #3 (Recommendations for the authors):

      Major points

      (1) Do the Authors have evidence that TIMP-1 was effective, as well as specific to the prelimbic cortex?

      We refer to the literature for the effectiveness and specificity of TIMP-1 to the prelimbic cortex.

      Specifically, the study by Okulski et al. (Biol. Psychiatry 2007) provides clear evidence that TIMP1 plays a role in synaptic plasticity in the prefrontal cortex. They showed that TIMP-1 is induced in the medial prefrontal cortex (mPFC) following stimulation that triggers late long-term potentiation (LTP), a key model of synaptic plasticity. Overexpression of TIMP-1 in the mPFC blocked the activity of matrix metalloproteinases (MMPs) and prevented the induction of late LTP in vivo. Similar effects were observed with pharmacological inhibition of MMP-9 in vitro, reinforcing the idea that TIMP-1 regulates extracellular proteolysis as part of the plasticity mechanism in the prefrontal cortex. These findings confirm that TIMP-1 is both effective and active in this specific brain region.

      Further evidence comes from Puścian et al. (Mol. Psychiatry 2022), who used TIMP-1-loaded nanoparticles to influence neuronal plasticity in the amygdala. They found that TIMP-1 affected MMP expression, LTP, and dendritic morphology, showing its impact on synaptic modifications. More directly relevant, Winiarski et al. (Sci. Adv. 2025) demonstrated that injecting TIMP-1-loaded nanoparticles into the prelimbic cortex altered responses to social stimuli, further supporting the idea that TIMP-1 has region-specific effects on behavioral processes.

      We have also updated the main text (page 8, 1st paragraph of “Effect of impairing neuronal plasticity in the PL on subterritory preferences and sociability”) of the manuscript to include the above references.

      (2) The Authors seem to suggest that one main reason for the different results compared to Shemesh et al. 2013 was the coarseness of the Eco-HAB data. In this case, I think this conclusion should be toned down because of this significant caveat.

      We thank the reviewer for pointing this out, and agree that this caveat and difference should be emphasized. To tone down the conclusion, we have

      (1) added details about the Eco-HAB (it being coarse-grained, etc.) in the abstract to tone down the conclusion.

      (2) added to the results summary in the Discussion (top of page 12) that the results are “within in the setup of the semi-naturalistic Eco-HAB experiments”

      (3) added to the Discussion (page 13) that the different results compared to Shemesh et al 2013 means that general studies of social behavior need to compare models with different levels of complexity (e.g. pairwise vs. higher-order interactions). (Also see Reviewer 2 Comment 2.)

      Minor points

      (1) Please explain what is measured in Fig. 1C (what is on the y axis?).

      Figure 1C shows the activity of the mice as measured by the rate of transitions, i.e. the number of times the mice switch boxes during each hour of the day, averaged over all N = 15 mice and T = 10 days (cohort M1). The error bars represent variability of activities across individuals or across days. For mouse-to-mouse variability (blue), we first compute for each mouse its number of transitions averaged over the same hour for all 10 days, then we compute its standard deviation across all 15 mice and plot it as error bars. For day-to-day variability (orange), we first compute for each day the number of transitions for each hour averaged over all mice, then compute its standard deviation across all 10 days as the errorbar. We have added the detailed explanation in the caption of Figure 1C.

      (2) In Fig. 3, it would be better to present the control group also in the main figure instead of the supplementary.

      We have merged Figure 3 and Figure 3 Supplementary 1 to present the control group also in the main figure.

      (3) In Fig. 3 and corresponding supplements, there seems to be a large difference between males and females. I think this would deserve some more discussion.

      While not being the main focus of this paper, we agree with the reviewer that the difference between male and female is important and deserves attention in the discussion and also future study. Thus we have added a paragraph in the Discussion (line 394-399, bottom of page 12).

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this report, the authors made use of a murine cell life derived from a MYC-driven liver cancer to investigate the gene expression changes that accompany the switch from normoxic to hypoxia conditions during 2D growth and the switch from 2D monolayer to 3D organoid growth under normoxic conditions. They find a significant (ca. 40-50%) overlap among the genes that are dysregulated in response to hypoxia in 2D cultures and in response to spheroid formation. Unsurprisingly, hypoxia-related genes were among the most prominently deregulated under both sets of conditions. Many other pathways pertaining to metabolism, splicing, mitochondrial electron transport chain structure and function, DNA damage recognition/repair, and lipid biosynthesis were also identified.

      We thank this reviewer for his/her time and efforts, and the insightful comments.

      Major comments:

      (1) Lines 239-240: The authors state that genes involved in DNA repair were identified as being necessary to maintain survival of both 2D and 3D cultures (Figure S6A). Hypoxia is a strong inducer of ROS. Thus, the ROS-specific DNA damage/recognition/repair pathways might be particularly important. The authors should look more carefully at the various subgroups of the many genes that are involved in DNA repair. They should also obtain at least a qualitative assessment of ROS and ROS-mediated DNA damage by staining for total and mitochondrial-specific ROS using dyes such as CM-H2-DCFDA and MitoSox. Actual direct oxidative damage could be assessed by immunostaining for 8-oxo-dG and related to the sub-types of DNA damage-repair genes that are induced. The centrality of DNA damage genes also raises the question as to whether the previously noted prominence of the TP53 pathway (see point 5 below) might represent a response to ROS-induced DNA damage.

      We thank this reviewer for the insightful comments, and agreed that ROS induced by hypoxia could play a role in modulating DNA repair and consequently cellular essentiality. Although pathway enrichment in Figure S6A (now as Figure 2-figure supplement 4A) showed that DNA repair pathway was essential to cell survival in hypoxia and 3D cultures, the genes associated with this pathway (Ddb1;Brf2;Gtf3c5;Guk1;Taf6) are not typical DNA repair genes. They are more likely involved in gene transcription. However, it will be interesting to see if they are specifically involved in DNA damage in response to ROS, which is out of focus of this study.

      (2) Because most of the pathway differences that distinguish the various cell states from one another are described only in terms of their transcriptome variations, it is not always possible to understand what the functional consequences of these changes actually are. For example, the authors report that hypoxia alters the expression of genes involved in PDH regulation but this is quite vague and not backed up with any functional or empirical analyses. PDH activity is complex and regulated primarily via phosphorylation/dephosphorylation (usually mediated by PDK1 and PDP2, respectively), which in turn are regulated by prevailing levels of ATP and ADP. Functionally, one might expect that hypoxia would lead to the down-regulation of PDH activity (i.e. increased PDH-pSer392) as respiration changes from oxidative to non-oxidative. This would not be appreciated simply by looking at PDH transcript levels. This notion could be tested by looking at total and phospho-PDH by western blotting and/or by measuring actual PDH activity as it converts pyruvate to AcCoA.

      We agreed with this reviewer that PDH activity regulation could be affected by multi-factors, and it is worthy of further validation by other approaches.

      (3) Line 439: Related to the above point: the authors state: "It is likely that blockade of acetyl-CoA production by PDH knockout may force cells to use alternative energy sources under hypoxic and 3D conditions, averting the Warburg effect and promoting cell survival under limited oxygen and nutrient availability in 3D spheroids." This could easily be tested by determining whether exogenous fatty acids are more readily oxidized by hypoxic 2D cultures or spheroids than occurs in normoxic 2D cultures.

      We thank for this suggestion. We apologized for not being able to validate everything.

      (4) Line 472: "Hypoxia induces high expression of Acaca and Fasn in NEJF10 cells indicating that hypoxia promotes saturated fatty acid synthesis...The beneficial effect of Fasn and Acaca KO to NEJF10 under hypoxia is probably due to reduction of saturated fatty acid synthesis, and this hypothesis needs to be tested in the future.". As with the preceding comment, this supposition could readily be supported directly by, for example, performing westerns blots for these enzymes and by showing that incubation of hypoxic 2D cells or spheroids converted more AcCoA into lipid.

      We thank for this suggestion. However, functional validation for the Fasn and Acaca KO is out of focus in this study.

      (5) In Supplementary Figure 2B&C, the central hub of the 2D normoxic cultures is Myc (as it should well be) whereas, in the normoxic 3D, the central hub is TP53 and Myc is not even present. The authors should comment on this. One would assume that Myc levels should still be quite high given that Myc is driven by an exogenous promoter. Does the centrality of TP53 indicate that the cells within the spheroids are growtharrested, being subjected to DNA damage and/or undergoing apoptosis?

      The predicted transcription factor activity analysis was based on the differential ATAC-seq peaks among different culture through pairwise comparisons. If TP53 and MYC were not present under that condition, it did not mean their activity was absent.

      “…the centrality of TP53 indicate that the cells within the spheroids are growth-arrested, being subjected to DNA damage and/or undergoing apoptosis?” This reviewer has raised an interesting question. We are investigating this hypothesis and hopefully we can give a clear answer in the future.

      (6) In the Materials and Methods section (lines 711-720), the description of how spheroid formation was achieved is unclear. Why were the cells first plated into non-adherent 96 well plates and then into nonadherent T75 flasks? Did the authors actually utilize and expand the cells from 144 T75 flasks and did the cells continue to proliferate after forming spheroids? Many cancer cell types will initially form monolayers when plated onto non-adherent surfaces such as plastic Petri dishes and will form spheroid-like structures only after several days. Other cells will only aggregate on the "non-adherent" surface and form spheroid-like structures but will not actually detach from the plate's surface. Have the authors actually documented the formation of true, non-adherent spheroids at 2 days and did they retain uniform size and shape throughout the collection period? The single photo in Supplementary Figure 1 does not explain when this was taken. The authors include a schematic in Figure 2A of the various conditions that were studied. A similar cartoon should be included to better explain precisely how the spheroids were generated and clarify the rationale for 96 well plating. Overall, a clearer and more concise description of how spheroids were actually generated and their appearance at different stages of formation needs to be provided.

      The cells were initially plated in non-adherent 96-well plates to facilitate the formation of spheroids in a controlled and uniform manner. As correctly mentioned by the reviewer, during the initial stages, cells cultured on non-adherent surfaces often form aggregates or clumps, and it takes a few days for them to develop into solid spheroids.

      In our study, we aimed to achieve 3D spheroid formation immediately following the transduction process to allow for screening under both 2D and 3D conditions. Plating the cells into 96-well plates enabled us to monitor and control the formation of spheroids in smaller volumes before scaling up the culture in non-adherent T75 flasks for subsequent experimental steps. This setup allows us to maintain gene editing processes under both 2D and 3D conditions.

      Regarding the proliferation and uniformity of spheroids:

      • Yes, the spheroids continued to proliferate after their formation.

      • True, non-adherent spheroids were documented as early as the next day. This was visually confirmed under microscopy, and size uniformity was maintained throughout the collection period by following optimized culture protocols.

      We also agreed with the reviewer’s suggestion to include a cartoon schematic similar to Figure 2A, illustrating the spheroid generation process and clarifying the rationale for using 96-well plates. We have included such a cartoon and speroid growth curve monitored by Incucyte as Figure 2-figure supplement 2.

      (7) The authors maintained 2D cultures in either normoxic or hypoxic (1% O2) states during the course of their experiments. On the other hand, 3D cultures were maintained under normoxic conditions, with the assumption that the interiors of the spheroids resemble the hypoxic interiors of tumors. However, the actual documentation of intra-spheroid hypoxia is never presented. It would be a good idea for the authors to compare the degree of hypoxia achieved by 2D (1% O2) and 3D cultures by staining with a hypoxia-detecting dye such as Image-iT Green. Comparing the fluorescence intensities in 2D cultures at various O2 concentrations might even allow for the construction of a "standard curve" that could serve to approximate the actual internal O2 concentration of spheroids. This would allow the authors to correlate the relative levels of hypoxia between 2D and 3D cultures.

      This is an excellent idea that we certainly will do it in our future experiments.

      (8) Related to the previous 2 points, the authors performed RNAseq on spheroids only 48 hours after initiating 3D growth. I am concerned that this might not have been a sufficiently long enough time for the cells to respond fully to their hypoxic state, especially given my concerns in Point 6. Might the results have been even more robust had the authors waited longer to perform RNA seq? Why was this short time used?

      We agreed with this reviewer. We were unsure if 48hours was an ideal timepoint. It might be necessary to perform a longitudinal experiment to harvest samples under different timepoints in the future experiments.

      (9) What happens to the gene expression pattern if spheroids are re-plated into standard tissue culture plates after having been maintained as spheroids? Do they resume 2D growth and does the gene expression pattern change back?

      This is a great question and we have never thought about what the gene expression pattern would be if speroids are re-plated in 2D. This could be a challenging experiment because the gene expression and epigenetic changes are timing related. However, the cells do grow well after re-plated in 2D.

      (10) Overall, the paper is quite descriptive in that it lists many gene sets that are altered in response to hypoxia and the formation of spheroids without really delving into the actual functional implications and/or prioritizing the sets. Some of these genes are shown by CRISPR screening to be essential for maintaining viability although in very few cases are these findings ever translated into functional studies (for example, see points 14 above). The list of genes and gene pathways could benefit from a better explanation and prioritization of which gene sets the authors believe to be most important for survival in response to hypoxia and for spheroid formation.

      This was a genome-wide study that integrated RNA-seq, ATAC-seq and CRISPR KO, providing resource to understand the oncogenic pathways in different culture conditions. We believe we have clearly articulated the important genes/pathways in our abstract.

      (11) The authors used a single MYC-driven tumor cell line for their studies. However, in their original paper (Fang, et al. Nat Commun 2023, 14: 4003.) numerous independent cell lines were described. It would help to know whether RNAseq studies performed on several other similar cell lines gave similar results in terms of up & down-regulated transcripts (i.e. representative of the other cell lines are NEJF10 cells).

      We have not generated RNA-seq data for these cell lines cultured in different conditions.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Fang et al., provides a tour-de-force study uncovering cancer cell's varied dependencies on several gene programs for their survival under different biological contexts. The authors addressed genomic differences in 2D vs 3D cultures and how hypoxia affects gene expression. They used a Myc-driven murine liver cancer model grown in 2D monolayer culture in normoxia and hypoxia as well as cells grown as 3D spheroids and performed CRISPR-based genome-wide KO screen to identify genes that play important roles in cell fitness. Some context-specific gene effects were further validated by in-vitro and in-vivo gene KO experiments.

      Strengths:

      The key findings in this manuscript are:

      (1) Close to 50% of differentially expressed genes were common between 2D Hypoxia and 3D spheroids conditions but they had differences in chromatin accessibility.

      (2) VHL-HIF1a pathway had differential cell fitness outcomes under 2D normoxia vs 2D hypoxia and 3D spheroids.

      (3) Individual components of the mitochondrial respiratory chain complex had contrasting effects on cell fitness under hypoxia.

      (4) Knockout of organogenesis or developmental pathway genes led to better cell growth specifically in the context of 3D spheroids and knockout of epigenetic modifiers had varied effects between 2D and 3D conditions.

      (5) Another key program that leads to cells fitness outcomes in normoxia vs hypoxia is the lipid and fatty acid metabolism.

      (6) Prmt5 is a key essential gene under all growth conditions, but in the context of 3D spheroids even partial loss of Prmt5 has a synthetic lethal effect with Mtap deletion and Mtap is epigenetically silenced specifically in the 3D spheroids.

      We appreciate this reviewer for acknowledging the strengths of our study.

      Issues to address:

      (1) The authors should clarify the link between the findings of the enrichment of TGFb-SMAD signaling REACTOME pathway to the findings that knocking out TGFb-SMAD pathway leads to better cell fitness outcomes for cells in the 3D growth conditions.

      We have clarified this link in abstract by saying “Notably, multicellular organogenesis signaling pathways including TGFb-SMAD, which is upregulated in 3D culture, specifically constrict the uncontrolled cell proliferation in 3D while inactivation of epigenetic modifiers (Bcor, Kmt2d, Mettl3 and Mettl14) has opposite outcomes in 2D vs. 3D:

      (2) Supplementary Figure 4C has been cited in the text but doesn't exist in the supplementary figures section.

      Sorry for this typo. It should be 5C which is Figure 2-figure supplement 3C in the new version of MS. We have corrected it now.

      (3) A small figure explaining this ABC-Myc driven liver cancer model in Supplementary Figure 1 would be helpful to provide context.

      We appreciate this suggestion. We have added a cartoon as Figure 1-figure supplement 1A to indicate the procedure for generation of this model.

      (4) The method for spheroids formation is not found in the method section.

      We described the method in our previous publication (Nature Communications 2023 Jul 6;14(1):4003.). However, we have added the information in method now, and the procedure is very simple (line 623-624). We found the murine liver cancer cell lines can readily form spheroids when they are cultured in low-attachment dish with standard DMEM complete media.

      (5) In Supplementary Figure 1b, the comparisons should be stated the opposite way - 3D vs 2D normoxia and 2D-Hypoxia vs 2D-Normoxia.

      We have made correction in the Figure legend of Figure S1B which is Figure 1B now in the new version of MS.

      (6) There are typos in the legend for Supplementary Figure 10.

      We have checked the typos.

      (7) Consider putting Supplementary Figure 1b into the main Figure 1.

      We have moved both Supplementary Figure 1a and 1b into main Figure 1 as Figure 1A and 1B. Hopefully, this will help the readers to catch the information easily.

      (8) Please explain only one timepoint (endpoint) for 3D spheroids was performed for the CRISPR KO screen experiment, while several timepoints were done for 2D conditions? Was this for technical convenience?

      As this reviewer speculated, indeed this was for technical convenience. We found that it was technically challenging to split the spheroids for CRISPR screening.

      (9) In line 372, it is indicated that Bcor KO (Fig 5e) had growth advantage - this was observed in only one of the gRNA -- same with Kmt2d KO in the same figure where there was an opposite effect. Please justify the use of only one gRNA.

      We actually used 4 gRNAs for each gene. In the heatmap, although one of the gRNA for each gene showed some levels of enrichment under hypoxic 2D condition, they were all highly enriched in 3D.

      (10) Why was CRISPR based KO strategy not used for the PRMT5 gene but rather than the use of shRNA.? Note that one of the shRNA for PRMT5 had almost no KO (PRMT5-shRNA2 Figure 7B) but still showed phenotype (Figure 7D) - please explain.

      We used shRNA as second approach for cross-validation. We agreed that the knockdown efficiency of shRNA2 was not as good as the others, with only about 40% knockdown efficiency.

      (11) In Figure 7D, which samples (which shRNA group) were being compared to do the t-test?

      The comparisons were for shCtrl and each of the shPRMT5. We have clarified this in figure legend.

      (12) In line 240, it is stated that oxphos gene set is essential for NEJF10 cell survival in both normoxia and hypoxia conditions. But shouldn't oxphos be non-essential in hypoxia as cells move away from oxphos and become glycolytic?

      This is a great question. While indeed hypoxia may promote the switch from oxphos to glycolysis, several studies showed that the low oxygen concentrations in hypoxic regions of tumors may not be limiting for oxphos, and ATP is generated by oxphos in tumors even at very low oxygen tensions (please see review Clin Cancer Res (2018) 24 (11): 2482–2490.). We therefore speculated that NEJF10 cells were still dependent on oxphos for ATP production under hypoxia. However, this needs further investigation. We have added this discussion in our manuscript (line 250-254).

      (13) In line 485 it is mentioned that Pmvk and Mvd genes which are involved in cholesterol synthesis when knocked out had a positive effect on cell growth in 3D conditions and since cholesterol synthesis is essential for cell growth how does this not matter much in the context of 3D - please explain.

      We thank this reviewer for this note. It seemed that only two gRNA for each were upregulated in 3D and it could be due to technical issue or clonal selection. We have deleted this sentence in our new version of MS.

      Reviewer #3 (Public review):

      Summary:

      In this study, Fang et al. systematically investigate the effects of culture conditions on gene expression, genome architecture, and gene dependency. To do this, they cultivate the murine HCC line NEJF10 under standard culture conditions (2D), then under similar conditions but under hypoxia (1% oxygen, 2D hypoxia) and under normoxia as spheroids (3D). NEJF10 was isolated from a marine HCC model that relies exclusively on MYC as a driver oncogene. In principle, (1) RNA-seq, (2) ATAC-seq and (3) genetic screens were then performed in this isogenic system and the results were systematically compared in the three cultivation methods. In particular, genome-wide screens with the CRISPR library Brie were performed very carefully. For example, in the 2D conditions, many different time points were harvested to control the selection process kinetically. The authors note differential dependencies for metabolic processes (not surprisingly, hypoxia signaling is affected) such as the regulation and activity of mitochondria, but also organogenesis signaling and epigenetic regulation.

      Strengths:

      The topic is interesting and relevant and the experimental set-up is carefully chosen and meaningful. The paper is well written. While the study does not reveal any major surprises, the results represent an important resource for the scientific community.

      We thank this reviewer for his/her positive comments.

      Weaknesses:

      However, this presupposes that the statistical analysis and processing are carried out very carefully, and this is where my main suggestions for revision begin. Firstly, I cannot find any information on the number of replicates in RNA- and ATAC-seq. This should be clearly stated in the results section and figure legends and cut-offs, statistical procedures, p-values, etc. should be mentioned as well. In principle, all NGS experiments (here ATAC- and RNA-seq) should be performed in replicates (at least duplicates, better triplicates) or the results should be validated by RT-PCR in independent biological triplicates. Secondly, the quantification of the analyses shown in the figures and especially in the legends is not sufficiently careful. Units are often not mentioned. Example Figure 4a: The legend says: 'gRNA reads' but how can the read count be -1? I would guess these are FC, log2FC, or Z-values. All figure legends need careful revision.

      Based upon the reviewer’s suggestions, we have added details about the replicates in figure legend. For gRNA read heatmap, the scale bar indicates the Z score. We have added the information in figure legends.

      Furthermore, I would find a comparison of the sgRNA abundances at the earliest harvesting time with the distribution in the library interesting, to see whether and to what extent selection has already taken place before the three culture conditions were established (minor point).

      This is great point. Unfortunately, we did not perform such an analysis.

      Recommendations for the authors:

      Reviewing Editor:

      There are three general issues:

      First, there is a lack of detail regarding much of the analysis. In some cases, this makes it difficult to assess the value of the data, albeit, there is generally a consensus the information is really interesting.

      Second, the findings - although provocative - lack mechanistic details and are focused more on descriptive findings. Hence, the manuscript would be improved by some effort at evaluating identified programs and providing some suggestions of mechanisms.

      Third, the authors need to put much more effort into the clarity and tightness of the presentation.

      We have made clarification in response to the reviewer’s comments.

      Reviewer #1 (Recommendations for the authors):

      Figure S1C. the labeling of the lower x-axis is inverted.

      Due to space limitation, we changed the figure orientation in our old version of MS. We have tilted the figure back in the new version, which is Figure 1-figure supplement 1B now.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      Summary:

      The authors address the role of the centromere histone core in force transduction by the kinetochore.

      Strengths:

      They use a hybrid DNA sequence that combines CDEII and CDEIII as well as Widom 601 so they can make stable histones for biophysical studies (provided by the Widom sequence) and maintain features of the centromere (CDE II and III).

      Weaknesses:

      The main results are shown in one figure (Figure 2). Indeed the Centromere core of Widom and CDE II and III contribute to strengthening the binding force for the OA-beads. The data are very nicely done and convincingly demonstrate the point. The weakness is that this is the entire paper. It is certainly of interest to investigators in kinetochore biology, but beyond that, the impact is fairly limited in scope.

      This reviewer might have missed that this is a Research Advance, not an article. Research Advances are limited in scope by definition and provide a new development that builds on research reported in a prior paper. They can be of any length. Our Research Advance builds on our prior work, Hamilton et al., 2020 and provides the new result that native centromere sequences strengthen the attachment of the kinetochore to the nucleosome.

      Reviewer #2:

      Summary:

      This paper provides a valuable addendum to the findings described in Hamilton et al. 2020 (https://doi.org/10.7554/eLife.56582). In the earlier paper, the authors reconstituted the budding yeast centromeric nucleosome together with parts of the budding yeast kinetochore and tested which elements are required and sufficient for force transmission from microtubules to the nucleosome. Although budding yeast centromeres are defined by specific DNA sequences, this earlier paper did not use centromeric DNA but instead the generic Widom 601 DNA. The reason is that it has so far been impossible to stably reconstitute a budding yeast centromeric nucleosome using centromeric DNA.

      In this new study, the authors now report that they were able to replace part of the Widom 601 DNA with centromeric DNA from chromosome 3. This makes the assay more closely resemble the in vivo situation. Interestingly, the presence of the centromeric DNA fragment makes one type of minimal kinetochore assembly, but not the other, withstand stronger forces.

      We thank the reviewer for their careful and positive assessment of our work.

      Which kinetochore assembly turned out to be affected was somewhat unexpected, and can currently not be reconciled with structural knowledge of the budding yeast centromere/kinetochore. This highlights that, despite recent advances (e.g. Guan et al., 2021; Dendooven et al., 2023), aspects of budding yeast kinetochore architecture and function remain to be understood and that it will be important to dissect the contributions of the centromeric DNA sequence.

      We couldn’t agree more.

      Given the unexpected result, the study would become yet more informative if the authors were able to pinpoint which interactions contribute to the enhanced force resistance in the presence of centromeric DNA.

      Strength:

      The paper demonstrates that centromeric DNA can increase the attachment strength between budding yeast microtubules and centromeric nucleosomes.

      Weakness:

      How centromeric DNA exerts this effect remains unclear.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) Additional specific mutants would be helpful in interpreting the effect observed. The authors speculate that a small segment of OA near the DNA (based on Dendooven et al., 2023) could be important. Would it be possible to introduce specific mutations and test this?

      This would be an interesting study but is far beyond the scope of a Research Advance. In fact, it would make a nice thesis project for a new student. Although perhaps not obvious, these studies require a large set of reagents including wrapped nucleosomes, which must be made fresh (they cannot be frozen) and five purified recombinant complexes, purified by specialized protocols that maintain their activity. Moreover, each datapoint is gathered one at a time. For example, the data in Figure 2 in this manuscript includes 343 datapoints acquired one at a time over the course of 1.5 years.  

      (2) Please provide the sequences of the other CEN3-W601 chimeras that were tested and did NOT stably wrap centromeric histone octamers. This may help others to design yet different constructs in the future. (Maybe the information is there and I didn't see it?)

      We fully agree and thank the reviewer for this excellent suggestion. The sequences and summaries of their wrapping stability are now provided in Table 3, page 17.

      (3) I wonder whether the authors tested the C0N3 sequence used in Dendooven et al., 2023. If not, could it be tested? This would more tightly couple the functional assay shown here with the structural work.

      We did not test the CON3 sequence, which was published several years after the start of this work. We agree that a tight coupling between the functional assay and the structural work would be useful. However, we also see the advantage of being able to go beyond the structural work and include even more CEN3 sequence than has so far been possible in the structural work.  

      In addition to measuring the role of DNA sequence in Okp1/Ame1 attachment to the nucleosome, we were interested in the role of DNA sequence in the attachment of Mif2. Therefore, we included all 35 bp of the Mif2 footprint in our chimeric CCEN DNA sequence. CON3 only includes 8 bp from CDEII. We did produce stable nucleosomes using CEN3-601 from Guan et al. (see Table 3). Again, CEN3-601 only includes 8 bp of the Mif2 footprint so we opted to study nucleosomes wrapped in our CCEN DNA with the entire Mif2 footprint. Curiously we found that even the entire Mif2 footprint was not enough to find the DNA sequence specificity seen in the EMSA experiments reported by Xiao et al., 2017.

      To help readers understand the differences between all these constructs, we have included them in Table 3.

      (4) Would an AlphaFold 3 prediction of the assemblies used in this paper be feasible and useful?

      The structures of the Dam1 complex (Jenni et al., 2018), Ndc80 complex (Zahm, et al., 2023 and references therein), MIND complex (Dimitrova et al., 2016), OA complex (Dendooven et al., 2023), and the nucleosome (Xaio et al., 2017; Yan et al., 2019; Guan et al., 2021; Dendooven et al., 2023) are published. The interactions between many of these complexes are understood beyond the level that AlphaFold3 could provide (Dimitrova et al., 2016; Dendooven et al., 2023). One of the main questions is how Mif2 interacts with the nucleosome and the other components of the kinetochore. Even structural analyses that included Mif2 in the assembly detect little or no Mif2 in the final structure. Unfortunately, AlphaFold3 is also not helpful as it predicts only the structure of the dimerization domain, which was already known (Cohen et al., 2008).

      AlphaFold3 predicts the rest of Mif2 is largely unstructured with several alpha helices predicted with low confidence.

      (5) Given that the centromeric DNA piece included should be able to bind the CBF3 complex, would it be possible to add this complex and test the effect on force transmission?

      This would be an interesting experiment, and we do expect CBF3 to bind. As stated above, this is far beyond the scope of this Research Advance. In our experience, with each new kinetochore subcomplex that we add into our reconstitutions, there are new challenges purifying the subcomplex in active form and in sufficient quantity. We are eager to add CBF3 but this is not something we can pull off in the context of this Research Advance. Thank you again for the time and energy spent reviewing our manuscript

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The authors set out to analyse the roles of the teichoic acids of Streptococcus pneumoniae in supporting the maintenance of the periplasmic region. Previous work has proposed the periplasm to be present in Gram positive bacteria and here advanced electron microscopy approach was used. This also showed a likely role for both wall and lipo-teichoic acids in maintaining the periplasm. Next, the authors use a metabolic labelling approach to analyse the teichoic acids. This is a clear strength as this method cannot be used for most other well studied organisms. The labelling was coupled with super-resolution microscopy to be able to map the teichoic acids at the subcellular level and a series of gel separation experiments to unravel the nature of the teichoic acids and the contribution of genes previously proposed to be required for their display. The manuscript could be an important addition to the field but there are a number of technical issues which somewhat undermine the conclusions drawn at the moment. These are shown below and should be addressed. More minor points are covered in the private Recommendations for Authors.

      Weaknesses to be addressed:

      (1) l. 144 Was there really only one sample that gave this resolution? Biological repeats of all experiments are required.

      CEMOVIS is a very challenging method that is not amenable to numerous repeats. However, multiple images were recorded from at least two independent samples for each strain. Additional sample images are shown in a new Fig. S3.

      CETOVIS is even more challenging (only two publications in Pubmed since 2015) and was performed on a single ultrathin section that, exceptionally, laid perfectly flat on the EM grid, allowing tomography data acquisition on ∆tacL cells. The reconstructed tomogram confirmed the absence of a granular layer in the depth of the section. Additionally, the numbering of Fig. S4A-B (previously misidentified as Fig. S2A-B) has been corrected in the text of V2.

      (2) Fig. 4A. Is the pellet recovered at "low" speeds not just some of the membrane that would sediment at this speed with or without LTA? Can a control be done using an integral membrane protein and Western Blot? Using the tacL mutant would show the behaviour of membranes alone.

      We think that the pellet is not just some of the membrane but most of it. In support of this view, the “low” speed pellets after enzymatic cell lysis contain not just some membrane lipids, but most of them (Fig. S10A). We therefore expect membrane proteins to be also present in this fraction. We performed a Western blot using antibodies against the membrane protein PBP2x (new Fig. S7C). Unfortunately, no signal was detected most likely due to protein degradation from contaminant proteases that we could trace to the purchased mutanolysin. The same sedimentation properties were observed with the ∆tacL strain as shown in Fig. 6A. However, in the ∆tacL strain the membrane pellet still contains membrane-bound TA precursors. It is therefore impossible to test definitely if pneumococcal membranes totally devoid of TA would sediment in the same way.

      (3) Fig. 4A. Using enzymatic digestion of the cell wall and then sedimentation will allow cell wall associated proteins (and other material) to become bound to the membranes and potentially effect sedimentation properties. This is what is in fact suggested by the authors (l. 1000, Fig. S6). In order to determine if the sedimentation properties observed are due to an artefact of the lysis conditions a physical breakage of the cells, using a French Press, should be carried out and then membranes purified by differential centrifugation. This is a standard, and well-established method (low-speed to remove debris and high-speed to sediment membranes) that has been used for S. pneumoniae over many years but would seem counter to the results in the current manuscript (for instance Hakenbeck, R. and Kohiyama, M. (1982), Purification of Penicillin-Binding Protein 3 from Streptococcus pneumoniae. European Journal of Biochemistry, 127: 231-236).

      Thank you for this suggestion. We have tested this hypothesis by breaking cells with a Microfluidizer followed by differential centrifugation. This experiment, which requires an important minimal volume, was performed with unlabeled cells (due to the cost of reagents) and assessed by Western blot using antibodies against the membrane protein PBP2x (new Fig. S7C). In this case, the majority of the membrane material was found in the high-speed pellet, as expected.

      We also applied the spheroplast lysis procedure of Flores-Kim et al. to the labeled cells, and found that most of the labeled material sedimented at low speed (new Fig. S7B), as observed with our own procedure.

      With these new results, the section on membrane density has been removed from the Supplementary Information. Instead, the fractionation is further discussed in terms of size of membrane fragments and presence of intact spheroplasts in the notes in Supplementary Information preceding Fig. S7.

      (4) l. 303-305. The authors suggest that the observed LTA-like bands disappear in a pulse chase experiment (Fig. 6B). What is the difference between this and Fig. 5B, where the bands do not disappear? Fig. 5C is the WT and was only pulse labelled for 5 min and so would one not expect the LTA-like bands to disappear as in 6B?

      Fig. 6B shows a pulse-chase experiment with strain ∆tacL, whereas Fig. 5C shows a similar experiment with the parental WT strain. The disappearance of the LTA-like band pattern with the ∆tacL strain (Fig. 6B), and their persistence in the WT strain (Fig. 5C), indicate that these bands are the undecaprenyl-linked TA in ∆tacL and proper LTA in the WT. A sentence has been added to better explain this point in V2.

      Note that we have exchanged the previous Fig. 5C and Fig. S13B, so that the experiments of Fig. 5A and 5C are in the same medium, as suggested by Reviewer #2.

      (5) Fig. 6B, l. 243-269 and l. 398-410. If, as stated, most of the LTA-like bands are actually precursor then how can the quantification of LTA stand as stated in the text? The "Titration of Cellular TA" section should be re-evaluated or removed? If you compare Fig. 6C WT extract incubated at RT and 110oC it seems like a large decrease in amount of material at the higher temperature. Thus, the WT has a lot of precursors in the membrane? This needs to be quantified.

      Indeed, the quantification of the ratio of LTA and WTA in the WT strain rests on the assumption that the amount of membrane-linked polymerized TA precursors is negligible in this strain. This assumption is now stated in the Titration section. We think it is the case. The true LTA and TA precursors do not have exactly the same electrophoretic mobility, being shifted relative to each other by about half a ladder “step”. This difference is visible when samples are run in adjacent lanes on the same gel, as in the new Fig. 6C. The difference of migration was well documented in the original paper about the deletion of tacL, although tacL was known as rafX at that time, and the ladders were misidentified as WTA (Wu et al. 2014. A novel protein, RafX, is important for common cell wall polysaccharide biosynthesis in Streptococcus pneumoniae: implications for bacterial virulence. J Bacteriol. 196, 3324-34. doi: 10.1128/JB.01696-14). This reference was added in V2. The experiment in the new Fig. 6C was repeated to have all samples on the same gel and treated at a lower temperature. The minor effect on the amount of LTA when WT cells are heated at pH 4.2 may be due to the removal of some labeled phosphocholine. We have NMR evidence that the phosphocholine in position D is labile to acidic treatment of LTA, which may lack in some cases, as reported by Hess et al. (Nat Commun. 2017 Dec 12;8(1):2093. doi: 10.1038/s41467-017-01720-z).

      (6) L. 339-351, Fig. 6A. A single lane on a gel is not very convincing as to the role of LytR. Here, and throughout the manuscript, wherever statements concerning levels of material are made, quantification needs to be done over appropriate numbers of repeats and with densitometry data shown in SI.

      Yes indeed. Apart from the titration of TA in the WT strain, we haven’t yet carried out a thorough quantification of TA or LTA/WTA ratio in different strains and conditions, although we intend to do so in a follow-up study, using the novel opportunities offered by the method presented here.

      However, to better substantiate our statement regarding the ∆lytR strain, we have quantified two experiments performed in C-medium with azido-choline, and two experiments of pulse labeling in BHI medium. The results are presented in the additional supplementary Fig. S14. The value of 51% was a calculation error, and was corrected to 41%. Likewise, the decrease in the WTA/LTA ratio was corrected to 5 to 7-fold.

      (7) 14. l. 385-391. Contrary to the statement in the text, the zwitterionic TA will have associated counterions that result in net neutrality. It will just have both -ve and +ve counterions in equal amounts (dependent on their valency), which doesn't matter if it is doing the job of balancing osmolarity (rather than charge).

      Thank you for pointing out this point. The paragraph has been corrected in V2.

      Reviewer #2 (Public review):

      The Gram-positive cell wall contains for a large part of TAs, and is essential for most bacteria. However, TA biosynthesis and regulation is highly understudied because of the difficulties in working with these molecules. This study closes some of our important knowledge gaps related to this and provides new and improved methods to study TAs. It also shows an interesting role for TAs in maintaining a 'periplasmic space' in Gram positives. Overall, this is an important piece of work. It would have been more satisfying if the possible causal link between TAs and periplasmic space would have been more deeply investigated with complemented mutants and CEMOVIS. For the moment, there is clearly something happening but it is not clear if this only happens in TA mutants or also in strains with capsules/without capsules and in PG mutants, or in lafB (essential for production of another glycolipid) mutants. Finally, some very strong statements are made suggesting several papers in the literature are incorrect, without actually providing any substantiation/evidence supporting these claims. Nevertheless, I support the publication of this work as it pioneers some new methods that will definitively move the field forward.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) l. 55 It is stated that TA are generally not essential. This needs to be introduced in a little more detail as in several species they are collectively. Need some more references here to give context.

      We have expended the paragraph and added a selection of references in V2.

      (2) l. 63 and Fig. 1A. Is the model based on the images from this paper? Is the periplasm as thick as the peptidoglycan layer? Would you not expect the density of WTA to be the same throughout the wall, rather than less inside? Do the authors think that the TA are present as rods in the cell envelope and because of this the periplasm looks a little like a bilayer, is this so? Is the relative thickness of the layers based on the data in the paper (Table 1)?

      The model proposed in Fig. 1A is not based on our data. It is a representation of the model proposed by Harold Erickson, and the appropriate reference has been added to the figure legend in V2. We do not speculate on the relative density of WTA inside the peptidoglycan layer, at the surface or in the periplasm. The only constraint from the model is that the density of WTA in the periplasm should be sufficient for self-exclusion and allow the brush polymer theory to apply. The legend has been amended in V2.

      We indeed think that the bilayer appearance of the periplasmic space in the wild type strain, and the single layer periplasmic space in the ∆tacL and ∆lytR support the Erickson’s model. Although the model was drawn arbitrarily, it turns out that the relative thickness of the peptidoglycan and periplasmic scale is in rough agreement with the measurements reported in Table 1.

      (3) Fig. 2. It is hard to orient oneself to see the layers. The use of the term periplasmic space (l. 132) and throughout is probably not wise as it is not a space.

      We prefer to retain this nomenclature since the term periplasmic space has been used in all the cell envelope CEMOVIS publications and is at the core of Erickson’s hypothesis about these observations and teichoic acids.

      (4) L. 147. This is not referring to Fig. S2A-B as suggested but Fig. S3A-B.

      This has been corrected.

      (5) l. 148. How do you know the densities observed are due to PG or certainly PG alone? Perhaps it is better to call this the cell wall.

      Yes. Cell wall is a better nomenclature and the text and Table 1 have been corrected in V2, in accordance with Fig. 2.

      (6) l. 165. It is also worth noting that peripheral cell wall synthesis also happens at the same site so this may well not be just division.

      Yes. We have replaced “division site” by “mid-cell” in V2.

      (7) l. 214 What is the debris? If PG digestion has been successful then there will be marginal debris. Is this pellet translucent (like membranes)? If you use fluorescently labelled PG in the preparation has it all disappeared, as would be expected by fully digested and solubilised material?

      In traditional protocols of bacterial membrane preparation, a low-speed centrifugation is first performed to discard “debris” that to our knowledge have not been well characterized but are thought to consist of unbroken cells and large fragments of cell wall. After enzymatic degradation of the pneumococcal cell wall, the low-speed pellet is not translucent as in typical membrane pellets after ultracentrifugation, but is rather loose, unlike a dense pellet of unbroken cells. A description of the pellet appearance was added in V2.

      It is a good idea to check if some labeled PG is also pelleted at low-speed after digestion. In a double labeling experiment using azido-choline and a novel unpublished metabolic probe of the PG, we found that the PG was fully digested and labeled fragments migrated as a couple of fuzzy bands likely corresponding to different labeled peptides. These species were not pelleted at low speed.

      (8) l. 219. Can you give a reference to certify that the low mobility material is WTA? Why does it migrate differently than LTA? Or is the PG digestion not efficient?

      WTA released from sacculi by alkaline lysis were found to migrate as a smear at the top of native gels revealed by alcian-blue silver staining, which is incompatible with SDS (Flores-Kim, 2019, 2022). The references have be added in V2. It could be argued in this case that the smearing was due to partial degradation of the WTA by the alkaline treatment.

      Bui et al. (2012) reported the preparation of WTA by enzymatic digestion of sacculi, but the resulting WTA were without muropeptide, presumably due to a step of boiling at pH 5 used to deactivate the enzymes.

      To our knowledge, this is the first report of pneumococcal WTA prepared by digestion of sacculi and analyzed by SDS-PAGE. Since the migration of WTA in native and SDS-PAGE is similar, we hypothesize that they do not interact significantly with the dodecyl sulphate, in contrast to the LTA, which bear a lipidic moiety. The fuzziness of the WTA migration pattern may also result from the greater heterogeneity due to the attached muropeptide, such as different lengths (di-, tetra-saccharide…), different peptides despite the action of LytA (tri-, tetra-peptide…), different O-acetylation status, etc.

      (9) L. 226-227, Fig S8. Presumably several of the major bands on the Coomassie stained gel are the lysozyme, mutanolysin, recombinant LytA, DNase and RNase used to digest the cell wall etc.? Can the sizes of these proteins be marked on the gel. Do any of them come down with the material at low-speed centrifugation?

      We have provided a gel showing the different enzymes individually and mixed (new Fig. S9G). While performing several experiments of this type, we found that the mutanolysin might be contaminated with proteases. The enzymes do not appear to sediment at low speed.

      (10) Fig. S9B. It is difficult to interpret what is in the image as there appear to be 2 populations of material (grey and sometimes more raised). Does the 20,000 g material look the same?

      Fig. S10B is a 20,000 × g pellet. We agree that there appears to be two types of membrane vesicles, but we do not know their nature.

      (11) l. 277 and Fig. 5A. Why is it "remarkable" that there are apparently more longer LTA molecules as the cell reach stationary phase?

      This is the first time that a change of TA length is documented. Such a change could conceivably have consequences in the binding and activity of CBPs and the physiology of the cell envelope in general. These questions should be adressed in future studies.

      (12) l. 280. How do you know which is the 6-repeat unit?

      It is an assumption based on previous analyses by Gisch et al.( J Biol Chem 2013, 288(22):15654-67. doi: 10.1074/jbc.M112.446963). The reference was added.

      (13) Fig. 5A and C. Panel C, the cells were grown in a different medium and so are not comparable to Panel A. Why is Fig. S12B not substituted for 5B? Presumably these are exponential phase cells.

      We have interverted the Fig. S13B and 5C in V2, as suggested, and changed the text and legends accordingly.

      Reviewer #2 (Recommendations for the authors):

      L30: vitreous sections?

      Corrected in V2.

      L32: as their main universal function --> as a universal function. To show it's the main universal function, you will need to look at this across various bacterial species.

      Changed to “possible universal function” in V2.

      L35: enabled the titration the actual --> titration of the actual?

      Corrected in V2.

      L34: consider breaking up this very long sentence.

      Done in V2.

      L37: may compensate the absence--> may compensate for the absence.

      Corrected in V2.

      L45: Using metabolic labeling and electrophoresis showed --> Metabolic labeling and...

      Corrected in V2.

      L46: This finding casts doubts on previous results, since most LTA were likely unknowingly discarded in these studies. This needs to be rephrased and is unnecessarily callous. While the current work casts doubts on any quantitative assessments of actual LTA levels measured in previous studies, it does not mean any qualitative assessments or conclusions drawn from these experiments are wrong. Better would be to say: These findings suggest that previously reported quantitative assessments of LTA levels are likely underestimating actual LTA levels, since much of the LTA would have been unknowingly discarded.

      If the authors do think that actual conclusions are wrong in previous work, then they need to be more explicit and explain why they were wrong.

      Yes indeed. The statement was toned down in V2.

      L55: Although generally non-essential. I would remove or rephrase this statement. I don't think any TA mutant will survive out in the wild and will be essential under a certain condition. So perhaps not essential for growth under ideal conditions, but for the rest pretty essential.

      The paragraph was amended by qualifying the essentiality to laboratory conditions and including selected references.

      L95: Note that the prevailing model until reference 20 (Gibson and Veening) was that the TA is polymerized intracellularly (see e.g. Figure 2 of PMID: 22432701, DOI: 10.1089/mdr.2012.0026). This intracellular polymerisation model seemed unlikely according to Gibson and Veening ('As TarP is classified by PFAM as a Wzy-type polymerase with predicted active site outside the cell, we speculate that TarP and TarQ polymerize the TA extracellularly in contrast to previous reports.'), but there is no experimental evidence as far as this referee knows of either model being correct.

      Despite the lack of experimental evidence, we think that Gibson and Veening are very likely correct, based on their argument, and also by analogy with the synthesis of other surface polysaccharides from undecaprenyl- or dolichol-linked precursors. It is unfortunate that Figure 2 of PMID: 22432701, DOI: 10.1089/mdr.2012.0026 was published in this way, since there was no evidence for a cytoplasmic polymerization, to our knowledge.

      L97: It is commonly believed, although I'm not sure it has ever been shown, that the capsule is covalently attached at the same position on the PG as WTA. Therefore, there must be some sort of regulation/competition between capsule biosynthesis and WTA biosynthesis (see also ref. 21). The presence of the capsule might thus also influence the characteristics of the periplasmic space. Considering that by far most pneumococcal strains are encapsulated, the authors should discuss this and why a capsule mutant was used in this study and how translatable their study using a capsule mutant is to S. pneumoniae in general.

      A paragraph was added in the Introduction of V2 to present the complication and a sentence was added at the end of the discussion to mention that this should be studied in the future.

      L102: Ref 29 should probably be cited here as well?

      Since in Ref 29 (Flores-Kim et al. 2019) there is a detectable amount of LTA (presumably precursors TA) in the ∆tacL stain, we prefer to cite only Hess et al. 2017 regarding the absence of LTA in the absence of TacL. However, we added in V2 a reference to Flores-Kim et al. 2019 in the following paragraph regarding the role of the LTA/WTA ratio.

      L106: dependent on the presence of the phosphotransferase LytR (21). --> dependent on the presence of the phosphotransferase LytR, whose expression is upregulated during competence (21).

      Corrected in V2.

      L119: I fail to see how the conclusions drawn by other groups (I assume the authors mean work from the Vollmer, Rudner, Bernhardt, Hammerschmidt, Havarstein, Veening groups?) are invalid if they compared WTA:LTA ratios between strains and conditions if they underestimated the LTA levels? Supposedly, the LTA levels were underestimated in all samples equally so the relative WTA/LTA ratio changes will qualitatively give the same outcome? I agree that these findings will allow for a reassessment of previous studies in which presumably too low LTA levels were reported, but I would not expect a difference in outcome when people compared WTA:LTA ratios between strains?

      The sentence was rephrased in V2 to be neutral regarding previous work and rather emphasize future possibilities.

      L131: Perhaps it would be good to highlight that such a conspicuous space has been noticed before by other EM methods (see e.g. Figs.4 and 5 or ref 19, or one of the most clear TEM S. pneumoniae images I have seen in Fig. 1F of Gallay et al, Nat. Micro 2021). However, always some sort of staining had previously been performed so it was never clear this was a real periplasmic space. CEMOVIS has this big advantage of being label free and imaging cells in their presumed native state.

      Thanks for pointing out these beautiful data that we had overlooked. We have added a few sentences and references in the Discussion of V2.

      L201: References are not numbered.

      Corrected in V2.

      L271/L892: Change section title. 'Evolution' can have multiple meanings. It would be more clear to write something like 'Increased TA chain length in stationary phase cells' or something like that.

      Changed in V2.

      L275: harvested

      Corrected in V2.

      L329: add, as suggested shown previously (I guess refs 24 and 29)

      Reference to Hess et al. 2017 has been added in V2. A sentence and further references to Flores-Kim, 2019, 2022 and Wu et al. 2014 were added at the end of the discussion with respect to the LTA-like signal observed in these studies of ∆tacL strains.

      L337: I think a concluding sentence is warranted here. These experiments demonstrate that membrane-bound TA precursors accumulate on the outside of the membrane, and are likely polymerized on the outside as well, in line with the model proposed in ref. 20.

      From the point of view of formal logic, the accumulation of membrane-bound TA precursors on the outer face of the membrane does not prove that they were assembled there. They could still be polymerized inside and translocated immediately. However, since this is extremely unlikely for the reasons discussed by Gibson and Veening, we have added a mild conclusion sentence and the reference in V2.

      L343: How accurate are these quantifications? Just by looking at the gel, it seems there is much less WTA in the lytR mutant than 50% of the wild type?

      Yes, the 51% value was a calculation error. This was changed to 41%. Likewise, the decrease of the WTA amount relative to LTA was corrected to 5- to 7-fold.

      Apart from the titration of TA in the WT strain, we haven’t yet carried out a careful quantification neither of TA nor of the LTA/WTA ratio in different strains and conditions, although we intend to do so in the near future using the method presented here.

      However, to better substantiate our statement regarding the ∆lytR strain, we have quantified two experiments of growth in C-medium with azido-choline, and two experiments of pulse labeling in BHI medium. The results are presented in the additional supplementary Fig. S14.

      L342: although WTA are less abundant and LTA appear to be longer (Fig. 6A). although WTA are less abundant and LTA appear to be longer (Fig. 6A), in line with a previous report showing that LytR the major enzyme mediating the final step in WTA formation (ref. 21). (or something like that). Perhaps better is to start this paragraph differently. For instance: Previous work showed that LytR is the major enzyme mediating the final step in WTA formation (ref. 21). As shown in Fig. 6A, the proportion of WTA significantly decreased in the lytR mutant. However, there was still significant WTA present indicating that perhaps another LCP protein can also produce WTA.

      Changed in V2.

      Of note, WTA levels would be a lot lower in encapsulated strains as used in Ref. 21 (assuming WTA and capsule compete for the same linkage on PG). So perhaps it would be hard to detect any residual WTA in a encapsulated lytR mutant?

      Investigation of the relationship between TA and capsule incorporation or O-acetylation is definitely a future area of study using this method of TA monitoring.

      L371: see my comments related to L131. Some TEM images clearly show the presence of a periplasmic space.

      Comments and references have been added in V2.

      L402: It would be really interesting to perform these experiments on a wild type encapsulated strain. Would these have much more LTA? (I understand you cannot do these experiments perhaps due to biosafety, but it might be interesting to discuss).

      Yes. It would be interesting to compare the TA in D39 and D39 ∆cps strains. We have added this perspective at the end of the discussion in V2.

      L418: ref lacks number

      Corrected in V2.

      L423: refs missing.

      References added in V2.

      L487: See my comments regarding L46. I do not see one valid point in the current paper why underestimating LTA levels would change any of the conclusions drawn in Ref. 21. I do not know the other papers cited well enough, but it seems highly unlikely that their conclusions would be wrong by systematically underestimating LTA levels. As far as I understand it, this current work basically confirms the major conclusions drawn by these 'doubtful' papers (that TacL makes LTA and LytR is the main WTA producer). As such, I find this sentence highly unfair without precisely specifying what the exact doubts are. Sure, this current paper now shows that probably people have discarded unknowingly LTA and therefore underestimated LTA levels, so any quantitative assessment of LTA levels are probably wrong. That is one thing. But to say this casts doubts on these studies is very serious and unfair (unless the authors provide good arguments to support these serious claims).

      Yes indeed. The sentence was rephrased to be strictly factual in V2.

      Table 2: I assume these strains are delta cps? Would be relevant to list this genotype.

      The Table 2 was completed in V2.

      The authors should comment on why the mutants have not been complemented, especially for lytR as it's the last gene in a complex operon. It would be great to see WTA levels being restored by ectopic expression of LytR.

      Yes. We think this could be part of an in-depth study of the attachment of WTA, together with the investigation of the other LCP phosphotransferases.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Joint Public Review:

      Summary:

      The behavioral switch between foraging and mating is important for resource allocation in insects. This study characterizes the role of sulfakinin and the sulfakinin receptor 1 in changes in olfactory responses associated with foraging versus mating behavior in the oriental fruit fly (Bactrocera dorsalis), a significant agricultural pest. This pathway regulates food consumption and mating receptivity in other species; here the authors use genetic disruption of sulfakinin and sulfakinin receptor 1 to provide strong evidence that changes in sulfakinin signaling modulate antennal responses to food versus pheromonal cues and alter the expression of ORs that detect relevant stimuli.

      Strengths:

      The authors utilize multiple complementary approaches including CRISPR/Cas9 mutagenesis, behavioral characterization, electroantennograms, RNA sequencing and heterologous expression to convincingly demonstrate the involvement of the sulfakinin pathway in the switch between foraging and mating behaviors. The use of both sulfakinin peptide and receptor mutants is a strength of the study and implicates specific signaling actors.

      Weaknesses:

      The authors demonstrate that SKR is expressed in olfactory neurons, however there are additional potential sites of action that may contribute to these results.

      Recommendations for the authors:

      The authors have addressed most of the issues raised by the reviewers. Below are a few outstanding issues.

      (1) Lines 68-69 describe "control of B. dorsalis include the use of the behavioral responses to semiochemicals" but does not describe what these responses are or how behavior is modulated.

      The sentence was revised as “Control of B. dorsalis include the use of the reproductive and feeding behavioral responses to semiochemicals” (lines 69 in the revision).

      (2) Statistical analysis for 9 hour starved females at 5 minutes is missing in Figure 1D and S1.

      We had added statistical analysis for 9 hour starved females at 5 minutes in the revised Figures 1D and S1, respectively (lines 578).

      (3) The legend in Figure S2 should be revised as it is not clear from the figure which of the odors are food associated odors.

      As suggested, we added food odor label in the revised Figure S2 (lines 666).

      (4) Line 167: "Therefore, the upregulated OR genes in starved WT flies, OR7a.4, OR7a.8 and OR10a, were activated by the pheromonal components, while down regulated genes, OR49a and OR63a, were activated by food volatiles." Based on the data, this sentence is incorrect - Therefore, the upregulated OR genes in starved WT flies, OR7a.4, OR7a.8 and OR10a, were activated by the food components, whereas downregulated genes, OR49a and OR63a, were activated by pheromonal components."

      We are sorry for our mistake. We had corrected it (lines 168-169).

      (5) Line 192: "The coordinated action of sulfakinin on mutiple downstreams,..." should be revised to "downstream pathways or tissues" or simply removing "multiple downstream".

      As suggested, we removed “multiple downstream”. See line 192.

      (6) Reference formatting is inconsistent: see line 207 vs line 208.

      We had corrected it as “(Wu et al., 2019)” (lines 207). 

      (7) Lines 241-244 The broad discussion regarding the evolution and ancestral function of CCK here and the phylogeny in Figure S6 are peripheral to the authors claims.

      As suggested, we removed the section and the Figure S6 in the revision.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This research article by Nath et al. from the Lee Lab addresses how lipolysis under starvation is achieved by a transient receptor potential channel, TRPγ, in the neuroendocrine neurons to help animals survive prolonged starvation. Through a series of genetic analyses, the authors identify that TRPγ mutations specifically lead to a failure in lipolytic processes under starvation, thereby reducing animals' starvation resistance. The conclusion was confirmed through total triacylglycerol levels in the animals and lipid droplet staining in the fat bodies. This study highlights the importance of transient receptor potential (TRP) channels in the fly brain to modulate energy homeostasis and combat metabolic stress. While the data is compelling and the message is easy to follow, several aspects require further clarification to improve the interpretation of the research and its visibility in the field.

      Strengths:

      This study identifies the biological meaning of TRPγ in promoting lipolysis during starvation, advancing our knowledge about TRP channels and the neural mechanisms to combat metabolic stress. Furthermore, this study demonstrates the potential of the TRP channel as a target to develop new therapeutic strategies for human metabolic disorders by showing that metformin and AMPK pathways are involved in its function in lipid metabolisms during starvation in Drosophila.

      Weaknesses:

      Some key results that might strengthen their conclusions were left out for discussion or careful explanation (see below). If the authors could improve the writing to address their findings and connect their findings with conclusions, the research would be much more appreciated and have a higher impact in the field.

      Here, I listed the major issues and suggestions for the authors to improve their manuscript:

      (1) Are the increased lipid droplet size and the upregulated total TAG level measured in the starved or sated mutant in Figure 1? This information might be crucial for readers to understand the physiological function of TRP in lipid metabolism. In other words, clarifying whether the upregulated lipid storage is observed only in the starved trp mutant will advance our knowledge of TRPγ. If the increase of total TAG level is only observed in the starved animals, TRP in the Dh44 neurons might serve as a sensor for the starvation state required to promote lipolysis in starvation conditions. On the other hand, if the total TAG level increases in both starved and sated animals, activation of Dh44 through TRPγ might be involved in the lipid metabolism process after food ingestion.

      We measured total TAG level in Figure 1 and LD sizes in Figure 2 under sated condition. We inserted “under sated condition” to clarify it. lines 97 and 147-148.

      Thanks for your suggestions.

      (2) It is unclear how AMPK activation in Dh44 neurons reduces the total triacylglycerol (TAG) levels in the animals (Figure 3G). As AMPK is activated in response to metabolic stress, the result in Figure 3G might suggest that Dh44 neurons sense metabolic stress through AMPK activation to promote lipolysis in other tissues. Do Dh44 neurons become more active during starvation? Is activation of Dh44 neurons sufficient to activate AMPK in the Dh44 neurons without starvation? Is activation of AMPK in the Dh44 neurons required for Dh44 release and lipolysis during starvation? These answers would provide more insights into the conclusion in Lines 192-193.

      In our previous study, we demonstrated that trpγ mutants exhibited lower levels of glucose, trehalose and glycogen level (Dhakal et al. 2022), and in the current study, we observed excessive lipid storage in the trpγ mutant, indicating imbalanced energy homeostasis. Given the established role of AMPK in maintaining energy balance (Marzano et. al., 2021, Lin et al 2021), we employed the activated form of AMPK (UAS-AMPK<sup>TD</sup>) in our experiments. Our result showed that expression of activated AMPK in Dh44 neurons led to a reduction in total TAG levels, suggesting that AMPK activation in these neurons can promote lipolysis even in the absence of starvation. Regarding the activation of Dh44 neurons, Dus et al in 2015 reported that Dh44 cells in the brain are activated by nutritive sugars especially in starvation conditions. In addition, another report showed a role of Dh44 neuron in regulating starvation induced sleep suppression (Oh et. al., 2023) which may imply that these neurons become more active under starved conditions. We did not directly assess whether Dh44 neuron activity increases during starvation or whether AMPK activation in these neurons is required for DH44 release and subsequent lipolysis, our finding support the notion that AMPK activation in Dh44 neuron is sufficient to reduce TAG levels, potentially by metabolic stress response typically observed during starvation. We explained it like the following: “Dh44 neurons regulate starvation-induced sleep suppression (Oh et. al., 2023), which implies that these neurons become more active under starved conditions.” lines 190-191.

      (3) It is unclear how the lipolytic gene brummer is further downregulated in the trpγ mutant during starvation while brummer is upregulated in the control group (Figure 6A). This result implies that the trpγ mutant was able to sense the starvation state but responded abnormally by inhibiting the lipolytic process rather than promoting lipolysis, which makes it more susceptible to starvation (Figure 3B).

      Thanks for your suggestions. We explained it like the following: “The data indicates that the trpg mutant can sense the starvation state but responds abnormally by suppressing lipolysis instead of activating it. This dysregulated lipolytic response likely increases the mutant's vulnerability to starvation, as it cannot effectively mobilize lipid stores for energy during periods of nutrient deprivation.” lines 251-254.

      (4) There is an inconsistency of total TAG levels and the lipid droplet size observed in the Dh44 mutant but not in the Dh44-R2 mutant (Figures 7A and 7F). This inconsistency raises a possibility that the signaling pathway from Dh44 release to its receptor Dh44-R2 only accounts for part of the lipid metabolic process under starvation. Adding discussion to address this inconsistency may be helpful for readers to appreciate the finding.

      Thanks for your suggestion. We included the following in the Discussion: “There is an inconsistency of total TAG levels and the LD size observed in the Dh44 mutant. This inconsistency raises a possibility that the signaling pathway from DH44 release to its receptor DH44R2 only accounts for part of the lipid metabolic process under starvation. While Dh44 mutant flies displayed normal internal TAG levels, Dh44R2 mutant flies exhibited elevated TAG levels. This suggested that the lipolysis phenotype could be facilitated by a neuropeptide other than DH44. Alternatively, a DH44 neuropeptide-independent pathway could mediate the lipolysis.” lines 429-436.

      Reviewer #2 (Public Review):

      Summary:

      In this paper, the function of trpγ in lipid metabolism was investigated. The authors found that lipid accumulation levels were increased in trpγ mutants and remained high during starvation; the increased TAG levels in trpγ mutants were restored by the expression of active AMPK in DH44 neurons and oral administration of the anti-diabetic drug metformin. Furthermore, oral administration of lipase, TAG, and free fatty acids effectively restored the survival of trpγ mutants under starvation conditions. These results indicate that TRPv plays an important role in the maintenance of systemic lipid levels through the proper expression of lipase. Furthermore, authors have shown that this function is mediated by DH44R2. This study provides an interesting finding in that the neuropeptide DH44 released from the brain regulates lipid metabolism through a brain-gut axis, acting on the receptor DH44R2 presumably expressed in gut cells.

      Strengths:

      Using Drosophila genetics, careful analysis of which cells express trpγ regulates lipid metabolism is performed in this study. The study supports its conclusions from various angles, including not only TAG levels, but also fat droplet staining and survival rate under starved conditions, and oral administration of substances involved in lipid metabolism.

      Weaknesses:

      Lipid metabolism in the gut of DH44R2-expressing cells should be investigated for a better understanding of the mechanism. Fat accumulation in the gut is not mechanistically linked with fat accumulation in the fat body. The function of lipase in the gut (esp. R2 region) should be addressed, e.g. by manipulating gut-lipases such as magro or Lip3 in the gut in the contest of trpγ mutant. Also, it is not clarified which cell types in the gut DH44R2 is expressed. The study also mentioned only in the text that bmm expression in the gut cannot restore lipid droplet enlargement in the fat body, but this result might be presented as a figure.

      We appreciate the reviewer’s insightful suggestions. Unfortunately, due to the unviability of the reagent (UAS-Lip3), we were unable to manipulate gut lipase in trpy mutants as proposed. However, we additionally performed immunostaining to examine the co-expression of trpγ and Dh44R2 in the gut, and our results indicate that both trpγ and Dh44R2 are co-expressed in the R2 region of the gut (Figure 7O and P). Furthermore, we have updated our figures to address the point that bmm expression in the gut does not restore lipid droplet enlargement in the fat body, with the revised version (Figure 5I and J).

      Reviewer #3 (Public Review):

      In this manuscript, the authors demonstrated the significance of the TRPγ channel in regulating internal TAG levels. They found high TAG levels in TRPγ mutant, which was ascribed to a deficit in the lipolysis process due to the downregulation of brummer (bmm). It was notable that the expression of TRPγ in DH44+ PI neurons, but not dILP2+ neurons, in the brain restored the internal TAG levels and that the knockdown of TRPγ in DH44+ PI neurons resulted in an increase in TAG levels. These results suggested a non-cell autonomous effect of Dh44+PI neurons. Additionally, the expression of the TRPγ channel in Dh44 R2-expressing cells restored the internal TAG levels. The authors, however, did not provide an explanation of how TRPγ might function in both presynaptic and postsynaptic cells in the non-cell autonomous manner to regulate the TAG storage. The authors further determined the effect of TRPγ mutation on the size of lipid droplets (LD) and the lifespan and found that TRPγ mutation caused an increase in the size of LD and a decrease in the lifespan, which were reverted by feeding lipase and metformin. These were creative endeavors, I thought. The finding that DH44+ PI neurons have non-cell autonomous functions in regulating bodily metabolism (mainly sugar/lipid) in addition to directing sugar nutrient sensing and consumption is likely correct, but the paper has many loose ends. I would like to see a revision that includes more experiments to tighten up the findings and appropriate interpretations of the results.

      (1) The authors need to provide interpretations or speculations as to how DH44+ PI neurons have non-cell autonomous functions in regulating the internal TAG stores, and how both presynaptic DH44 neurons and postsynaptic DH44 R2 neurons require TRPγ for lipid homeostasis.

      In Discussion, we had mentioned our previous finding. “ We previously proposed that TRPg holds DH44 neurons in a state of afterdepolarization, thus reducing firing rates by inactivating voltage-gated Na+ channels (Dhakal et al., 2022). At the physiological level, this induces the consistent release of DH44 and depletion of DH44 stores, resulting in nutrient utilization and storage malfunctions.”

      We also included the following: “TRPg in DH44 neurons may influence the release of metabolic signals or hormones that act on postsynaptic DH44R2 cells. These postsynaptic cells could, in turn, modulate lipid storage and metabolism in a non-cell autonomous manner. However, the mechanism by which TRPg functions in DH44R2 cells remains unclear. One possible explanation is that TRPg in the gut may be activated by stretch or osmolarity (Akitake et al. 2015).” lines 439-440.

      This interaction between presynaptic and postsynaptic cells may ensure a coordinated response to metabolic changes and maintain lipid homeostasis. Thus, both Dh44-expressing and Dh44-R2-expressing cells are crucial for the proper functioning of TRPγ in regulating internal TAG levels and lipid storage.

      (2) The expression of TRPγ solely in DH44 R2 neurons of TRPγ mutant flies restored the TAG phenotype, suggesting an important function mediated by TRPγ in DH44 R2 neurons. However, the authors did not document the endogenous expression of TRPγ in the DH44R2+ gut cells. This needs to be shown.

      We appreciate the reviewer’s suggestion. To address this, we performed immunostaining to examine the expression of TRPγ in the DH44R2+ gut cells. Our results, as shown in Figure 7 O and P, confirm that TRPγ is co-expressed in the Dh44R2+ cells in the gut. We also found that Dh44R2 is expressed in the brain as well. We documented this part like the following: “Given that Dh44R2 is predominantly expressed in the intestine, we performed immunostaining to examine whether Dh44R2 co-localizes with trpg in gut cells. Our results confirmed that Dh44R2 and trpg are co-expressed in intestinal cells (Figure 7O and P). Additionally, we analyzed Dh44R2 expression in the brain and found that two Dh44R2-expressing cells are co-localized with Dh44-expressing cells in the PI region (Figure 7Q). To further delineate whether Dh44R2-mediated fat utilization is specific to the brain, gut, or fat body, we knocked down Dh44R2<sup>RNAi</sup> using Dh44-GAL4, myo1A-GAL4, and cg-GAL4, respectively (Figure 7–figure supplement 1E). Notably, knockdown of Dh44R2 with Myo1A-GAL4 resulted in elevated TAG levels, indicating that DH44R2 activity in lipid metabolism is specific to the gut.” lines 375-384.

      (3) While Dh44 mutant flies displayed normal internal TAG levels, Dh44R2 mutant flies exhibited elevated TAG levels (Figure 7A). This suggested that the lipolysis phenotype could be facilitated by a neuropeptide other than Dh44. Alternatively, a Dh44 neuropeptide-independent pathway could mediate the lipolysis. In either case, an additional result is needed to substantiate either one of the hypotheses.

      The Dh44 mutant flies exhibited normal TAG levels, whereas Dh44R2 mutant flies showed elevated TAG levels. However, when we examined the lipid droplets in the fat body, both Dh44 mutant and Dh44R2 mutant flies displayed larger lipid droplets, indicating a disruption in lipid metabolism. Additionally, we assessed starvation survival time and found that both Dh44 and Dh44R2 mutant flies exhibited reduced survival under starvation conditions compared to controls. Supplementation with lipase (Figure 7–figure supplement 1A), glycerol (Figure 7–figure supplement 1B), hexanoic acid (Figure 7–figure supplement 1C), and mixed TAGs (Figure 7–figure supplement 1D) improved starvation survival time, further supporting that the lipid metabolism pathway was impaired in both mutants. These observations highlight the role of Dh44 in regulating lipolysis. We included related Discussion: “There is an inconsistency of total TAG levels and the LD size observed in the Dh44 mutant. This inconsistency raises a possibility that the signaling pathway from DH44 release to its receptor DH44R2 only accounts for part of the lipid metabolic process under starvation. While Dh44 mutant flies displayed normal internal TAG levels, Dh44R2 mutant flies exhibited elevated TAG levels. This suggested that the lipolysis phenotype could be facilitated by a neuropeptide other than DH44. Alternatively, a DH44 neuropeptide-independent pathway could mediate the lipolysis.” lines 429-436.

      (4) While the authors observed an increased area of fat body lipid droplets (LD) in Dh44 mutant flies (Figure 7F), they did not specify the particular region of the fat body chosen for measuring the LD area.

      We have chosen the 2-3 segment in the abdomen for all fat body images, which we already mentioned in Nile red staining in the Method section line 630-631.

      (5) The LD area only accounts for TAG levels in the fat body, whereas TAG can be found in many other body parts, including the R2 area as demonstrated in Figure 5A-D using Nile red staining. As such, measuring the total internal TAG levels would provide a more accurate representation of TAG levels than the average fat body LD area.

      We have measured total internal TAG level in whole body throughout the experiments (Figure 1F, 2C, 2E, 3C, 3G, 4A, 4B, 7A, 7I, and many Supplementary Figures) except bmm expression using GAL4/UAS system. Now we include this new data in Figure 5–figure supplement 1) which is the same conclusion with LD analysis.

      (6) In Figure 5F-I, the authors should perform the similar experiment with Dh44, Dh44R1, and Dh44R2 mutant flies.

      We did the experiments with Dh44, Dh44R1, and Dh44R2 mutant flies and we found that Dh44 and Dh44R2 mutant flies showed reduced starvation survival time than control and which was increased after supplementation of lipase, glycerol, hexanoic acid and TAG (Figure 7– figure supplement 1A–D). lines 361-372.

      (7) The representative image in Figure 6B does not correspond to the GFP quantification results shown in Figure 6C. In trpr1;bmm::GFP flies, the GFP signal appears stronger in starved conditions than in satiated conditions.

      We updated it with new images. We quantified GFP intensity level using image J and found that GFP intensity level was significantly lower in starved condition in trpγ<sup>1</sup>;bmm::GFP flies than sated condition.

      (8) In Figure 6H-I, fat body-specific expression of bmm reversed the increased LD area in TRPγ mutants. The authors also showed that Dh44+PI neuron-specific expression of bmm yielded a similar result. The authors need to provide an interpretation as to how bmm acts in the fat body or DH44 neurons to regulate this.

      We first inserted the following in results: “Furthermore, the expression of bmm in the fat body, as well as Dh44 neurons in the PI region, can promote lipolysis at the systemic level.” lines 276-277.

      Additionally, we discussed it in the Discussion: “Brummer lipase is essential for regulating lipid levels in the insect fat body by mediating lipid mobilization and energy homeostasis. In Nilaparvata lugens, it facilitates triglyceride breakdown (Lu et al., 2018), while studies in Drosophila show that reduced Brummer lipase expression decreases fatty acids and increases diacylglycerol levels, highlighting its role in lipid metabolism (Nazario-Yepiz et al., 2021). Here, we additionally demonstrate that bmm expression in DH44 neurons within the PI region can systemically regulate TAG levels. Cell signaling or energy status in DH44 neurons may contribute to hormonal release that targets organs such as the fat body.” lines 451-459.

      (9) The authors should explain why the DH44 R1 mutant did not represent similar results as the wild type.

      We added “In addition, bmm levels in Dh44R1<sup>Mi</sup> under starved condition did not increase as significantly as in the control. This suggests a unique role of DH44 and its receptors in regulating lipid metabolism and response to nutritional status in Drosophila.” lines 358-360.

      (10) It would be good to have a schematic that represents the working model proposed in this manuscript.

      We updated the schematic model in revised version (Figure 8).

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      This paper characterized the function of trpγ in Dh44-expressing PI neurons for lipid metabolism and lipolysis induced by prolonged starvation. The authors applied a series of lipolytic genetic manipulation and lipid/lipid metabolism supplements to rescue the trpγ deficits in lipolysis: the expression of active AMPK in the DH44-expressing PI neurons or brummer, a lipolytic gene, in the trpγ-expressing cells, and oral administration of the anti-diabetic drug metformin, lipase, TAG and free fatty acids. Despite this exhaustive characterization of the defective lipolysis in the trpγ mutants, there remain puzzles in inconsistent defects of Dh44 and DH44R2 in the total TAG levels and in the expression and functions of the receptor in the gut. Clarification of these points and other issues raised by the reviewers should improve the mechanisms of lipid metabolism through Dh44 signalling.

      Reviewer #1 (Recommendations For The Authors):

      (1) It might be worth introducing Dh44 in the introduction section as it is unclear to readers how the authors hypothesized the site-of-action of TRPγ in Dh44 neurons for lipid metabolism after reading the introduction.

      We introduced the following: “We found that TRPg expression in Dh44 neuroendocrine cells in the brain is critical for maintaining normal carbohydrate levels in tissues (Dhakal et al. 2022). Building on this, we hypothesized that TRPg in Dh44 cells also regulates lipid and protein homeostasis.” lines 69-71.

      (2) Providing a summary model in the end to integrate the present findings and their previous publication about TRPγ functions in Drosophila sugar selection would greatly help readers understand and appreciate the general role of TRPγ in balancing energy homeostasis.

      We made a schematic model in Figure 8.

      (3) Swapping the order of Figures 5 and 6 might be a better way to tell the story without logic gaps. The results addressing the mechanisms of metformin and TRPγ in promoting lipolysis under starvation are interrupted by the lipid storage data in the R2 cells in the current Figure 5A-5E. In addition, presenting Figure 5A-5E before or together with Figure 7 will help readers appreciate the expression of Dh44-R2 and its function in regulating lipid metabolism in Figure 7.

      We did.

      (4) It might be misleading to use the word "sated" for the condition of 5-hour mild starvation. The word "mild starvation" or the equivalents might be a better word choice.

      We appreciate the reviewer’s concern. As hemolymph sugar level does not drop down significantly in 5 hr starvation, the previous papers (Dus et al 2015, Dhakal et al 2022) indicated it as sated condition. To use the word consistently, we prefer using “sated” instead of “mild starvation”.

      (5) It is unclear what the white arrows are pointing at in Figures 7O and 7P. Some of those seem to be non-specific signals, so it is hard to connect the figure to the conclusion in Lines 351-353. It would be helpful to add some explanations to help readers interpret Figures 7O and 7P.

      In the previous version, Figure 7O and 7P white arrows represented the expression of Dh44R2 in the SEZ region of the brain and R2 region of the gut. In revised version, to make clear, we performed additional immunostaining for the co-expression of trpγ and Dh44R2 in the gut. We found that trpγ and Dh44R2 co-expressed at the R2 region of the gut specifically (Figure 7O and P). Similarly, we found that two cells of Dh44R2 co-expressed in Dh44 cells in the PI region of the brain (now Figure 7Q). We updated this part. lines 375-380.

      (6) The figure legend for the (G) panel in Figure 2-figure Supplement 1 was mislabeled as (F).

      We corrected it.

      (7) In Line 85, the authors might want to write "… among these mutants, only trpγ mutant displayed reduced carbohydrate levels, suggesting …". Please confirm the information for the sentence. lines 87-88.

      We clarified it.

      Reviewer #2 (Recommendations For The Authors):

      (1) The trpγ[G4] would be difficult for non-Drosophila researchers to understand; it would be better to use trpγ-Gal4.

      We got the mutant line from Dr. Craig Montell who named it. We explained it like the following in the main text: “controlled by GAL4 knocked into the trpg locus (trpg<sup>G4</sup> flies; +)” line 109.

      (2) The arrows in Figures 7O and 7P need to be explained in the figure legends.

      We did.

      Reviewer #3 (Recommendations For The Authors):

      (11) Lines 95-96 should have a reference.

      We did.

      (12) Lines 129-130: It should read "TRPγ expressed in DH44 cells is sufficient for the regulation of lipid levels."

      We changed it as suggested.

      (13) Figure 5E needs to be repeated with more trials.

      We increased the n numbers. Previously (Figure 5E) we included area of 10 LDs from 3 samples, and in revised figure (Figure 6I) we have included 28 LDs from 10 samples.

      (14) Figures 5F-I, bold lines are not too visible and therefore, dotted lines could be used.

      We changed it as suggested.

      (15) Line 356: It is not true that D-trehalose or D-fructose is commonly detected by DH44 neurons. These sugars at concentrations much higher than the physiological concentration range stimulate DH44 neurons (see Dus et al., 2015).

      We removed it.

      (16) Lines 362-363: It should read "Expression of TRPγ in DH44 neurons was necessary and sufficient to regulate the carbohydrate and lipid levels.".

      We changed it.

      (17) Lines 369-370: The authors need to consider removing the possible role of CRF in regulating lipid homeostasis. It could be considered to be far-fetched.

      We removed it.

      (18) Line 407-408: the sentence "Nevertheless, it is also known that DH44 neurons mediate the influence of dietary amino acids on promoting food intakes in flies (37)" needs to be removed. They used amino acid concentrations that were far greater than the physiological levels observed in the internal milieu of flies. Still, many laboratories cannot reproduce the result of using the high AA concentrations.

      We removed it.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (public review): 

      This manuscript presents SAVEMONEY, a computational tool designed to enhance the utilization of Oxford Nanopore Technologies (ONT) long-read sequencing for the design and analysis of plasmid sequencing experiments. In the past few years, with the improvement in both sequencing length and accuracy, ONT sequencing is being rapidly extended to almost all omics analyses which are dominated by short-read sequencing (e.g., Illumina). However, relatively higher sequencing errors of long-read sequencing techniques including PacBio and ONT is still a major obstacle for plasmid/clone-based sequencing service that aims to achieve single base/nucleotide accuracy. This work provides a guideline for sequencing multiple plasmids together using the same ONT run without molecular barcoding, followed by data deconvolution. The whole algorithm framework is well-designed, and some real data and simulation data are utilized to support the conclusions. The tool SAVEMONEY is proposed to target users who have their own ONT sequencers and perform library preparation and sequencing by themselves, rather than relying on commercial services. As we know and discussed by the authors, in the real world, to ensure accuracy, the researchers will routinely pick up multiple colonies in the same plasmid construction and submit for Sanger sequencing. However, SAVEMONEY is not able to support the simultaneous analysis of multiple colonies in the same run, as compared to the barcoding-based approaches. This is a major limitation in the significance of this work. Encouraging computational ePorts in ONT data debarcoding for mixed-plasmid or even single-cell sequencing would be more valuable in the field. 

      We thank the reviewer for the positive response to our manuscript and the helpful comments.

      The tool SAVEMONEY is proposed to target users who have their own ONT sequencers and perform library preparation and sequencing by themselves, rather than relying on commercial services.

      We apologize that we were not clear enough in the manuscript. Our tool is designed for users who rely on commercial services (i.e., those who cannot include a barcode by themselves). However, it can also benefit those performing library preparation, as SAVEMONEY can be applied after standard barcode-based sequencing and de-multiplexing. The combination of standard barcodes with SAVEMONEY would significantly expands the scope of sequencing applications. For example, it would enable sequencing of more plasmid types than the number of available barcodes and, in some cases, it may even eliminate the need for barcode introduction. Because we do not own ONT equipment and because the primary target audience for the SAVEMONEY algorithm are users without ONT equipment, we were not able to conduct experiments using ONT. However, to clarify these possibilities, we added a dedicated paragraph describing these issues (3rd paragraph in the discussion section).

      However, SAVEMONEY is not able to support the simultaneous analysis of multiple colonies in the same run, as compared to the barcoding-based approaches.

      We agree with the reviewer about this limitation of SAVEMONEY, as it does not allow mixing of plasmids from multiple colonies in the same cloning run. However, that does not necessarily mean that SAVEMONEY cannot reduce sequencing costs in cloning. For example, when sequencing two colonies from each of three diPerent constructs (six plasmids in total), the standard approach would require sequencing costs for six samples. However, with SAVEMONEY, up to three plasmids can be mixed per sample, allowing them to be sequenced as just two samples. As a result, the sequencing cost per plasmid is reduced to one-third. The greatest benefits can be realized when SAVEMONEY is used at the laboratory level or by multiple researchers. To make this point clearer, we have added sentences in the 5th paragraph of the discussion section.

      (1) To provide more comprehensive information for users who care about the cost, the Introduction section should include a cost comparison between Sanger and ONT, with more details, such as diPerent ONT platforms (MinION, PromethION, FlongIe), chemistries (flow cells) and kits. This additional information will be more helpful and informative for the users who have their own sequencers and are the target audience for SAVEMONEY. 

      We thank the reviewer for pointing this out. Since we do not own ONT equipment, we are unable to provide a total cost for using the ONT platform. However, we have included the price per sample (~$15 per plasmid) for the commercial service we have used, as well as the equipment that they employ (V14 chemistry on a PromethION with an R10.4.1 flow cell) and the number of reads obtained per plasmid (~100–1000) in the 4th paragraph of the introduction section.     Though these costs will inevitably change over time, this information should still be helpful for those who own ONT sequencers in estimating the costs.

      (2) In "Overview of the algorithm" (Pages 3-4) under the Results section, instead of stating "However, coverage varies from ~100-1000 and is diPicult to predict because each nanopore flow cell has diPerent properties.", it will be beneficial to provide more detailed information, such as sequencing length, yield/read count per flow cell of diPerent platforms. This information will assist users in designing their own experiments ePectively. 

      We thank the reviewer for the comment. As mentioned in the previous response, we are unable to provide sequencing length, yield/read count per flow cell because we do not own ONT equipment. However, we apologize if it was not clear in "Overview of the algorithm" section that we are discussing the use of results obtained from commercial services, and therefore we need to provide more detailed information about the results from the commercial service. We have now clarified in the sentence pointed out by the reviewr that the numbers are derived from the information provided by commercial sequencing services. In addition, we have also added that typical examples of the result properties, i.e., read length and quality score distribution, can be found in Fig. 2 at the end of the same paragraph.

      (3) While this study optimized and evaluated the tool using a total of 14 plasmids, it may not provide suPicient power to represent the diversity of the plasmid world. Consideration should be given to expanding the dataset to include a broader range of plasmids in future studies to enhance the robustness and generalizability of the tool. 

      We are grateful to the reviewer for their valuable input. It is very reasonable that we had to expect that a larger number of plasmids should be used, even though the main target of SAVEMONEY is those who utilize commercial services. In the previous version of SAVEMONEY, it was not possible to process in a reasonable amount of time if too many plasmids were provided, though the algorithm itself does not have no restrictions based on the number of plasmids. Therefore, we have changed the underlying code to improve the algorithm, making it more than 20 times faster than the previous version (the benchmark time mentioned in the 3rd paragraph of the discussion section was improved to 3.1 minutes from the previous 65 minutes, using the same dataset and the same computer). Additionally, SAVEMONEY is now compatible with multiprocessing. The processing time is expected to decrease approximately inversely proportional to the number of CPU cores used. We have added these updates at the end of the 3rd paragraph in the discussion section.

      (4) If applicable and feasible, including a comparison or benchmark of SAVEMONEY against other similar tools would further strengthen the manuscript. This comparison would allow users to evaluate the advantages and disadvantages of diPerent tools for their specific needs. 

      We thank the reviewer for the suggestion. We have added the benchmark using the similar tool, On-Ramp, with the exact same set of plasmids and FASTQ data used for our benchmark (4th paragraph in the discussion section). Because the machine specifications used in the On-Ramp web server are unknown, a direct comparison is not possible. However, using only laptop-level computational resources, SAVEMONEY was able to process the data 38% faster than On-Ramp. When using mini-PC level computational resources, the processing time was 64% faster than on-RAMP.

      (5) The importance of pre-filtering raw sequencing reads should be emphasized as noisy reads can significantly impact the overall performance of the tool. It is essential to clarify whether any pre-filtering steps were performed in this study, such as filtering based on quality scores, read length, or other relevant factors. 

      We apologize for not being clear. Unfortunately, the commercial sequencing service we used did not provide the information regarding pre-filtering. However, the impact of the quality of pre-filtering based on quality score and read length on the quality of the final results is theoretically minimal in SAVEMONEY. First, during the initial step of the post-analysis, the classification step, short reads compared to the full plasmid length can be excluded based on the user-defined “score_threshold”. Simultaneously, low-quality reads with poor alignment to the plasmid can also be excluded, because “score_threshold” is related to the normalized alignment score. Even if there are low-quality reads that are not excluded at this stage, the ePect can be minimized during the final step of the post-analysis that generates consensus sequences. This is because our Bayesian analysis considers not only the base calling but also the q-scores to determine the consensus. Therefore, we believe the overall impact of pre-filtering on the final results is negligible.

      (6) The statement regarding the number of required reads per plasmid (20-30) and the maximum number of plasmids (up to six) that can be mixed in a single run may become outdated due to the rapid advancements in ONT technology. In the Discussion section, instead of assuming specific numbers, it would be more beneficial to provide information based on the current state of ONT sequencing, such as the number of reads per MinION flow cell that can be produced.

      We thank the reviewer for pointing this out. Because the number of required reads per plasmid depends on the accuracy of each read (i.e., the number of required reads can be reduced if the accuracy increases), we have added the description of these points to the last paragraph of the discussion section.

      Reviewer #2 (public review):  

      The authors developed an algorithm that allows for deconvoluting of plasmid sequences from a mixture of plasmids that have been sequenced by nanopore long read technology. As library preparations and barcoding of individual samples increase sequencing costs, the algorithm bypasses this need and thus decreases time on sample prep and sequencing costs. In the first step, the tool assesses which of the plasmid constructions can be mixed in a single library preparation by calculating a distance matrix between the reference plasmid and the constructions producing sequence clusters. The user is given groups of plasmids, from diPerent clusters, to be pooled together for sequencing. After sequencing, the algorithm deconvolutes the reads by classifying them based on alignments to the reference sequence. A Bayesian analysis approach is used to obtain a consensus sequence and quality scores. 

      Strengths 

      The authors exploit one of the main advantages of long-read sequencing which is to accurately resolve regions of high complexity, as regularly found in plasmids, and developed a tool that can validate plasmid constructions by reducing sequencing costs. Multiple plasmids (up to six) can be analyzed simultaneously in a single library without the need for sample barcoding, also reducing sample preparation time. Although inserts must be diPerent, just 2 bases diPerence would be enough for a correct assignation. It maximizes cost-ePiciency for projects that require large amounts of plasmid constructions and highthroughput validation. 

      We thank the reviewer for the positive response to our manuscript and the helpful comments.

      Weaknesses 

      The method proposed by the authors requires prior knowledge of plasmid sequences (i.e., blueprints or plasmid reference) and is not suitable for small experiments. The plasmid inserts or backbones must be diPerent e.g., multiple colonies from the same plasmid construction ePort cannot be submitted together. 

      As also discussed in the response to reviewer 1, we agree with the reviewer that SAVEMONEY does not allow you the analysis of plasmids from multiple colonies in the same cloning experiment. However, that does not necessarily mean that SAVEMONEY cannot reduce the sequencing cost. For example, when sequencing two colonies from each of three diPerent constructs (six plasmids in total), the standard approach would require sequencing costs for six samples. However, with SAVEMONEY, up to three plasmids can be mixed per sample, allowing them to be sequenced as just two samples. As a result, the sequencing cost per plasmid is reduced to one-third. The greatest benefits can be realized when SAVEMONEY is used at the laboratory level or by multiple researchers. To make this point clearer, we have added sentences in the 5th paragraph of the discussion section.

      The reviewer also expressed concern that SAVEMONEY is not suitable for experiments at a small scale. To put it more precisely, SAVEMONEY cannot be used when the experiment size is minimal, such as in a lab that consistently constructs only a single plasmid at a time. That said, the strength of SAVEMONEY lies in its scalability. Even in labs where plasmid construction is typically limited to one at a time, there may be occasional instances where two or more plasmids are created simultaneously. In such cases, SAVEMONEY can be used to reduce sequencing costs. Moreover, in a typical molecular biology lab where multiple plasmids are constructed every week, SAVEMONEY can be particularly ePective. Given its adaptability and cost-saving potential and widespread use since its initial publication on bioRxiv and on Google Colab, we are confident that SAVEMONEY will continue to be a valuable tool for a wide range of researchers.

      Recommendations For The Authors:

      Reviewer #2 (Recommendations For The Authors): 

      The manucript assumes all samples are sent out for sequencing at a specific company. This could be generalized for a much broader use since many labs now own nanopore sequencers. In turn, the advantage of reducing hands-on sample prep becomes more evident. 

      We thank the reviewer for pointing this out. We agree that SAVEMONEY can also benefit those performing library preparation. Combination of standard barcodes with SAVEMONEY significantly expands the scope of sequencing applications. For example, it enables sequencing of more plasmid types than the number of available barcodes and, in some cases, may even eliminate the need for the sample prep step to introduce barcode. Because we do not own ONT equipment, we could not conduct experiments using ONT. However, to clarify these possibilities, we added a dedicated paragraph (3rd paragraph in the discussion section).

      The base calling model (high accuracy, super accuracy) used by Plasmidsaurus and tested here should be mentioned.  

      We thank the reviewer for the suggestion. The description about the base calling model (HAC) was added in Materials and Methods section.

      Other modifications to the revised manuscript 

      Beyond changes made in response to reviewer comments above, we have also through our continued use and improvement of SAVEMONEY, made additional changes to the algorithm and therefore to the manuscript. Those changes are outlined below. Improvements in the pre-survey step

      (1) The pre-survey algorithm was reduced to a Zero-One Integer Linear Programming Problem to guarantee the optimal combinations, as previous versions did not ensure an optimal solution. Relatedly, the explanation of the algorithm in the main manuscript was updated.

      (2) The algorithm was modified to ensure that the number of plasmids distributed to each group is balanced. A new feature was also added to allow users to specify the number of groups, which is beneficial when balancing between cost and quality.

      (3) An error was corrected in Fig. 2, where the distance calculation method for the hierarchical clustering step for group formation was Farthest Point Algorithm, which calculates distance between two clusters based on the farthest pair of plasmids. The correct method is the Nearest Point Algorithm. This error was present only in Fig. 2, while other implementations, including source code of SAVEMONEY and Google Colab page, were correct from the beginning. We have corrected the error in Fig. 2.

      Modifications in figures, manuscripts, and other aspects

      (1) Fig. 3 was updated to reflect the update of SAVEMONEY, although it did not show any important diPerences.

      (2) Parameter names were updated as follows:

      “threshold (pre)” -> “distance_threshold”

      “threshold (post)” -> “score_threshold” Added “number_of_groups”

      (3) The order of elements was rearranged in Fig. 4.

      (4) Incorrect calculations were fixed in Fig. 4g, h, and i (old Fig. 4d, h, and l). Related to that, Fig. 4j, k, and l and Table 1 were added, in addition to the explanation in the main manuscript.

      (5) SAVEMONEY was packaged and was released on PyPI to facilitate easy installation and integration by other developers.

      (6) SAVEMONEY was updated and expanded to accommodate linear DNA fragments, such as PCR amplicons and long synthetic DNA. Users can select the topology of DNA by specifying that as an option. A description of this new capability was added at the end of “Overview of the algorithm” section.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1:

      (…) some concerns with interpretations and technical issues make several major conclusions in this manuscript less rigorous, as explained in detail in comments below. In particular, the two major concerns I have: 1) the contradiction between the strong reduction of global translation, with puromycin incorporation gel showing no detectable protein synthesis in cold, and an apparently large fraction of transcripts whose abundance and translation in Fig. 2A are both strongly increased. 2) The fact that no transcripts were examined for dependance on IRE-1/XBP1 for their induction by cold, except for one transcriptional reporter, and some weaknesses (see below) in data showing activation of IRE-1/XBP-1 pathway. The conclusion for induction of UPR by cold via specific activation of IRE-1/XBP-1 pathway, in my opinion, requires additional experiments.

      Relating to the first point, the results of puromycin incorporation and ribosome profiling are not contradictory. The former shows absolute changes in translation, i.e. changes in how much protein the cell is producing, while the latter shows relative changes between the produced proteins, i.e. how the cell prioritizes its protein production. An observed up-regulation in ribosome profiling does not necessarily mean (but could) that the corresponding protein goes up in absolute terms (units produced per time). Instead, it implies that out of the population of all translating ribosomes, a larger fraction is translating (prioritizing) this particular mRNA relative to other mRNAs. The second point is addressed later in the response.

      Major concerns:

      (1) Fig. 1B shows polysomes still present on day 1 of 4ºC exposure, but the gel in Fig. 1C suggests a complete lack of protein synthesis. Why?

      We realized that the selected gel exposure may give the false impression of a complete lack of puromycin incorporation at 4ºC. To avoid confusion, we now show in Figure 1 – figure supplement 1 the original gel image next to its longer exposure. The quantification of puromycin incorporation remains in Fig. 1C (it is based on 3 biological replicates and only one replicate is shown in the corresponding supplement). We hope it is now clear that there is an ongoing puromycin incorporation/translation at 4ºC, albeit much reduced compared with 20ºC.

      What is then the evidence that ribosomal footprints used in much of the paper as evidence of ongoing active translation are from actual translating rather than still bound to transcripts but stationary ribosomes, considering that cooling to 4ºC is often used to 'freeze' protein complexes and prevent separation of their subunits? The authors should explain whether ribosome profiling as a measure of active translation has been evaluated specifically at 4ºC, or test this experimentally.

      While the ribosomal profiling alone might not prove ongoing translation, the residual puromycin incorporation does (see the longer gel exposure in Figure 1 – figure supplement 1). To strengthen this argument, we selected two additional genes (cebp-1 and numr-1) whose ribosomal footprints increase in the cold, and whose GFP-fusions were available from the CGC. Monitoring their expression, we observed the expected increase in the cold (see Figure 2 – figure supplement 3 A-B). The ongoing translation in the cold is also in line with our previous study (Peke et al., 2022), where we observed de novo protein synthesis of other proteins under the same cooling conditions as in this study.

      They should also provide some evidence (like Western blots) of increases in protein levels for at least some of the strongly cold-upregulated transcripts, like lips-11.

      As explained above, we addressed it by additionally examining two strains expressing GFP-fused proteins, whose translation in the cold is predicted to increase according to our ribosomal profiling data. See the new Figure 2 – figure supplement 3 A-B.

      As puromycin incorporation seems to be the one direct measure of global protein synthesis here, it conflicts with much of the translation data, especially considering that quite a large fraction of transcripts have increased both mRNA levels and ribosome footprints, and thus presumably increased translation at 4ºC, in Fig. 2A.

      We hope the above explanations put this concern to rest.

      Also, it is not clear how quantitation in Fig. 1C relates to the gel shown, the quantitation seems to indicate about 50-60% reduction of the signal, while the gel shows no discernable signal.

      A above, see a longer western blot exposure in Figure 1 – figure supplement 1 and note that the quantification is based on three biological replicates.

      (2) It is striking that plips-11::GFP reporter is induced in day 1 of 4ºC exposure, apparently to the extent that is similar to its induction by a large dose of tunicamycin (Fig. 3 supplement),

      We did not intend to compare the extend of induction between cold and tunicamycin treatment. The tunicamycin experiment was meant to confirm that, as suggested by expression data from Shen et al. 2005, lips-11 is upregulated upon UPR activation.

      …but the three IRE-1 dependent UPR transcripts from Shen 2005 list were not induced at all on day 1 (Fig. 4 supplement). Moreover, the accumulation of the misfolded CPL-1 reporter, that was interpreted as evidence that misfolding may be triggering UPR at 4ºC, was only observed on day 1, when the induction of the three IRE-1 targets is absent, but not on day 3, when it is stronger. How does this agree with the conclusion of UPR activation by cold via IRE-1/XBP-1 pathway?

      In the originally submitted supplemental figure, we compared mRNA levels between day 1 animals at 20ºC versus 4ºC. However, as argued later by this reviewer, it may be better to use day 0 animals at 20ºC as the reference (since at 20ºC the animals will continue producing embryos). Thus, we repeated the RT-qPCR analysis with additional time points (and genes relevant to other comments). This analysis, now in Figure 4 – figure supplement 2, shows that these mRNAs (dnj-27, srp-7, and C36B7.6) increased already at day 1 in the cold compared with the reference 20ºC animals on day 0, and their levels increased further on day 3.

      It is true that the authors do note very little overlap between IRE-1/XBP-1-dependent genes induced by different stress conditions, but for most of this paper, they draw parallels between tunicamycin-induced and cold induced IRE-1/XBP-1 activation.

      We carefully re-examined the manuscript to ensure that we do not draw parallels between cold and tunicamycin treatment. The three genes (dnj-27, srp-7, and C36B7.6) were taken from Shen et al. because that study reported lips-11 as an IRE-1-responsive gene, which we realized thanks to the Wormbase annotation of lips-11. Examining the three genes in our expression data, srp-7 (like lips-11) is also upregulated more than 2-fold, while the other two genes go up but less than 2-fold. As mentioned by the reviewer, we note little overlap between the different stress conditions suggesting that the response is context dependent. Additional differences may arise if, as we hypothesize, UPR is activated in the cold in response to both protein and lipid stress. Note that the 2-fold cutoff used in the previous Figure 7 – figure supplement 1 was (erroneously) on the log2 scale, so showed genes upregulated at least 4-fold. We now corrected it to 2-fold. While there are now a few more overlapping genes, the overall conclusion, that there is little overlap between different conditions, did not change. We now list the shared genes in the new Supplementary file 5.

      The conclusion that "the transcription of some cold-induced genes reflects the activation of unfolded protein response (UPR)..." is based on analysis of only one gene, lips-11. No other genes were examined for IRE-1 dependence of their induction by cold, neither the other 8 genes that are common between the cold-induced genes here and the ER stress/IRE-1- induced in Shen 2005 (Venn diagram in Figure 7 supplement), nor the hsp-4 reporter. What is the evidence that lips-11 is not the only gene whose induction by cold in this paper's dataset depends on IRE-1? This is a major weakness and needs to be addressed.

      Furthermore, whether induction by cold of lips-11 itself is due to IRE1 activation was not tested, only a partial decrease of reporter fluorescence by ire-1 RNAi is shown. A quantitative measure of the change of lips-11 transcript in ire-1 and xbp-1 mutants is needed to establish if it depends on IRE-1/XBP-1 pathway.

      We now examined by RT-qPCR if the induction of the three genes from Shen at al. (dnj-27, srp-7, and C36B7.6), as well as lips-11 and hsp-4 depends on IRE-1. In the new Figure 4 – figure supplement 2, we show that the upregulation of all these genes is reduced in the cold in the ire1 mutant (although in the wild type, the increase of hsp-4 mRNA appeared to be non-significant, despite the observed upregulation of the hsp-4 GFP reporter).

      The authors could provide more information and the additional data for the transcripts upregulated by both ER stress and cold, including the endogenous lips-11 and hsp-4 transcripts: their identity, fold induction by both cold and ER stress, how their induction is ranked in the corresponding datasets (all of these are from existing data), and do they depend on IRE-1/XBP-1 for induction by cold?

      As above, the dependence of endogenous lips-11 and hsp-4 on IRE-1 is now shown in the new Figure 4 – figure supplement 2, and the shared genes from Figure 7 – figure supplement 1 are listed in the new Supplementary file 5. We did not perform additional analysis comparing various data sets, as we felt that understanding the differences between IRE-1-mediated transcription outputs across different conditions goes well beyond this study.

      Without these additional data and considering that the authors did not directly measure the splicing of xbp-1 transcript (see comment for Fig. 3 below), the conclusion that cold induces UPR by specific activation of IRE-1/XBP-1 pathway is premature.

      To address the splicing of endogenous xbp-1, we examined our ribosome profiling data for the translation of spliced xbp-1, and found that the spliced variant is more abundant in the cold. This data is now shown in Figure 3 – figure supplement 2B.

      There are also technical issues that are making it difficult to interpret some of the results, and missing controls that decrease the rigor of conclusions:

      (1) For RNAseq and ribosome occupancy, were the 20ºC day 1 adult animals collected at the same time as the other set was moved to 4ºC, or were they additionally grown at 20ºC for the same length of time as the 4ºC incubations, which would make them day 2 adults or older at the time of analysis? This information is only given for SUnSET: "animals were cultivated for 1 or 3 additional days at 4ºC or 20ºC".

      In the RNAseq experiments, the 20ºC animals were collected at the same time as the others were moved to 10ºC (and then 4ºC), so they were not additionally grown at 20ºC. We make it now clear in Methods.

      This could be a major concern in interpreting translation data: First, the inducibility of both UPR and HSR in worms is lost at exactly this transition, from day 1 to day 2 or 3 adults, depending on the reporting lab (for example Taylor and Dillin 2013, Labbadia and Morimoto, 2015, De-Souza et al 2022).

      As explained above, the 20ºC animals were collected at the same time as the others were moved to 4ºC. Then, we reported before that ageing appears to be suppressed in animals incubated at 4ºC (Habacher et al., 2016; Figure S1C). Thus, it terms of their biological age, cold-incubated animals appear to be closer to the 20ºC animals at the time they are moved to the cold (day 0). Thus, the ageing-associated deterioration in UPR inducibility mentioned above presumably does not apply to cold-incubated animals, which is in line with the observed IRE-1-dependent upregulation of several genes in day 3 animals at 4ºC.

      How do authors account for this? Would results with reporter induction, or induction of IRE-1 target genes in Fig. 4, change if day 1 adults were used for 20ºC?

      Our analysis in Figure 4 – figure supplement 2 now includes 20ºC animals at day 0, 1, and 3.

      Second, if animals at the time of shift to 4ºC were only beginning their reproduction, they will presumably not develop further during hibernation, while an additional day at 20ºC will bring them to the full reproductive capacity. Did 4ºC and 20ºC animals used for RNAseq and ribosome occupancy have similar numbers of embryos, and were the embryos at similar stages?

      As explained above, the reference animals at 20ºC were young adults containing few embryos. Indeed, at 4ºC the animals do not accumulate embryos. Although we cannot say that for all genes, note that the genes analysed in Figure 4 – figure supplement 2 increase in abundance also when compared with the day 3 animals kept at 20ºC.

      (2) Second, no population density is given for most of the experiments, despite the known strong effects of crowding (high pheromone) on C. elegans growth. From the only two specifics that are given, it seems that very different population sizes were used: for example, 150 L1s were used in survival assay, while 12,000 L1s in SUnSET. Have the authors compared results they got at high population densities with what would happen when animals are grown in uncrowded plates? At least a baseline comparison in the beginning should have been done.

      None of the experiments involved crowded populations. In the SUnSET experiments, we just used larger and more plates to obtain sufficient material.

      (3) Fig. 3: it is unclear why the accepted and well characterized quantitative measure of IRE1 activation, the splicing of xbp-1transcript, is not determined directly by RT-PCR. The fluorescent XBP-1spliced reporter, to my knowledge, has not been tested for its quantitative nature and thus its use here is insufficient. Furthermore, the image of this fluorescent reporter in Fig. 3b shows only one anterior-most row of cells of intestine, and quantitation was done with 2 to 5 nuclei per animal, while lips-11 is induced in entire intestine. Was there spliced XBP-1 in the rest of the intestinal nuclei? Could the authors show/quantify the entire animal (20 intestinal cells) rather than one or two rows of cells?

      As explained above, we now included the analysis of xbp-1 splicing in Figure 3 – figure supplement 2B. As for the fluorescent reporter, it is difficult to measure all gut nuclei since part of the gut is occluded by the gonad. Nonetheless, we do see induction of the reporter in other gut nuclei and show now additional examples from midgut in Figure 3 – figure supplement 2A.  

      (4) The differences in the outcomes from this study and the previous one (Dudkevich 2022) that used 15ºC to 2ºC cooling approach are puzzling, as they would suggest two quite different IRE-1 dependent programs of cold tolerance. It would be good if authors commented on overlapping/non-overlapping genes, and provided their thoughts on the origin of these differences considering the small difference in temperatures.

      Indeed, there seem to be substantial differences between different temperatures and cooling paradigms. While understanding the C. elegans responses to cold is still in its infancy, one possible explanation for the observed differences is that we used different starting growth temperatures. While the initial populations in our study were grown at 20ºC, Dudkevich et al. used 15ºC. Worms display profound physiological differences between these two temperatures. For example, Xiao et al. (2013) showed that the cold-sensitive TRPA-1 channel is important at 15ºC but not 20ºC. Thus, the trajectories along which worms adapt to near freezing temperature may vary depending on their initial physiological state (and perhaps the target temperature, as we used 4ºC and they 2ºC). We now expanded argumentation on this topic in Discussion. I should also say that we planned on testing NLP-3 function in our paradigm, but our request for strains remained unanswered.

      Second, have the authors performed a control where they reproduced the rescue by FA supplementation of poor survival of ire-1 mutants after the 15ºC to 2ºC shift? Without this or another positive control, and without measuring change in lipid composition in their own experiments, it is unclear whether the different outcomes with respect to FAs are due to a real difference in adaptive programs at these temperatures, or to failure in supplementation?

      While we did not re-examine the findings by Dudkevich et al., we did include now another positive control. As reporter by Hou et al. (2014), supplementing unsaturated FAs rescues the induction of the hsp-4 reporter in fat-6 RNAi-ed animals. Although we were able to reproduce that result (Figure 6 – figure supplement 1), the same supplementation procedure did not suppress the lips11 reporter (Figure 6 – figure supplement 2).

      (5) Have the authors tested whether and by how much ire-1(ok799) mutation shortens the lifespan at 20ºC? This needs to be done before the defect in survival of ire-1 mutants in Fig. 7a can be interpreted.

      The lifespan at standard cultivation temperature was examined by others (Henis-Korenblit et al., 2010; Hourihan et al., 2016), showing that ire-1(ok799) mutants live shorter. However, while some mechanism that prolong lifespan may also improve cold survival, the two phenomena are not identical and whether IRE-1 facilitates longevity and cold survival in the same or different way remains to be seen.

      Reviewer #2:

      (1) The conclusions regarding a general transcriptional response are based on one gene, lips-11, which does not affect survival in response to cold. We would suggest altering the title, to replace "Reprograming gene expression: with" Regulation of the lipase lips-11".

      We now examined IRE-1 dependent induction of additional genes – see Figure 4 – figure supplement 2. While we do not know what fraction of cold-induced genes depends on IRE-1, we feel that our findings justify the statement that that gene expression in the cold involves the IRE1/XBP-1 pathway (title) or that that the transcription of some/a subset of cold-induced genes depend on this pathway (in abstract, model, and discussion).

      (2) There is no gene ontology with the gene expression data.

      We now included the top 10 most enriched and suppressed gene categories between 10ºC and 4ºC (since the biggest change happens between these conditions, as shown in Figure 2 – figure supplement 1A). This is now included in the Figure 2 – figure supplement 2.

      (3) Definitive conclusions regarding transcription vs translational effects would require use of blockers such as alpha amanatin or cyclohexamide.

      As explained also for reviewer 1, we confirmed now that at least some genes, whose translation is upregulated based on the ribosome profiling, are indeed upregulated in the cold at the protein level (Figure 2 – figure supplement 3A-B). Thus, the increase in ribosomal occupancy seems to accurately reflect increased translation. Since mRNA levels correlate overall with the ribosomal occupancy, it appears that the mRNA levels are the main determinants of the translation output. Because the lips-11 promoter is sufficient to upregulate the GFP reporter in the cold, it further suggests that the regulation happens at the transcription level. It is true that at this point we cannot completely rule out the effects of mRNA stability, which we clearly acknowledge in the discussion.

      (4) Conclusions regarding the role of lipids are based on supplementation with oleic acid or choline, yet there is no lipid analysis of the cold animals, or after lips-1 knockdown.

      We agree that this is an important direction for future studies but feel that lipidomic analysis goes beyond the scope of current work.

      Although choline is important for PC production, adding choline in normal PC could have many other metabolic impacts and doesn't necessarily implicate PC without lipidomic or genetic evidence.

      We agree and acknowledge it now in Discussion: “However, choline also plays other roles, including in neurotransmitter synthesis and methylation metabolism. Thus, we cannot yet rule out the possibility that the protective effects of choline supplementation stem from functions outside PC synthesis.”

      Reviewer #3:

      The study has several weaknesses: it provides limited novel insights into pathways mediating transcriptional regulation of cold-inducible genes, as IRE-1 and XBP-1are already well-known responders to endoplasmic reticulum stress, including that induced by cold.

      We presume the reviewer refers to the study by Dudkevich et al. (2022). As explained in our manuscript, there are important differences between that study and ours in how the IRE-1 signalling is utilized and to what ends.

      Additionally, the weak cold sensitivity phenotype observed in ire-1 mutants casts doubt on the pathway's key role in cold adaptation. The study also overlooks previous research (e.g.PMID: 27540856) that links IRE-1 to SKN-1, another major stress-responsive pathway, potentially missing important interactions and mechanisms involved in cold adaptation.

      We state in the manuscript that the IRE-1 pathway plays a modest but significant role in cold adaptation and state in the Fig. 7 model and Discussion that additional pathways work alongside IRE-1 to drive cold-specific gene expression.

      Recommendations for the authors:

      Reviewer #1:

      Minor comments:

      (1) Fig. 2B - reporter expression seems to be already present in the intestine of 20ºC animals. What is the turnover rate of GFP in the intestine and how is it affected by the temperature shift? If GFP degradation is inhibited, could it explain the increase in signal in 4ºC animals, rather than increased transcription? This seems to be true for the hsp-4 transcriptional reporter, as the GFP fluorescence appears to increase during 4ºC incubation (Fig. 4a), but the hsp-4 message levels are only increased after 1 day but not in later days at 4ºC, based on the RNAseq in provided dataset. How well do changes in lips-11 reporter fluorescence correspond to the changes in the endogenous lips-11 transcript?

      Note that increased GFP fluorescence is accompanied by increased mRNA levels. In addition to the RNAseq data, we now also examined changes of the endogenous lips-11 transcript by RTqPCR and observed its strong (and IRE-1 dependent) upregulation in the cold– see Figure 4 – figure supplement 2. Moreover, we now included two other examples of GFP-tagged proteins whose fluorescence increases in the cold, concomitant with increased mRNA levels and ribosomal occupancy (Figure 2 – figure supplement 2A-B).

      (2) Descriptions of methods to measure different aspects of translation are very abbreviated and in some places make it difficult to understand the paper. One example - what is RFP in Fig. 2a?

      We replaced now “RFP” with “RPF” (ribosome protected fragment) and the abbreviation is explained firsts time it is used.

      (3) How was the effectiveness of RNAi at 4ºC validated?

      As explained in Methods, we subjected animals to RNAi long before they were transferred to 4ºC, so the corresponding protein is depleted prior to cooling.

      (4) Several of the conclusions on translation and ribosomal occupancy are written in a somewhat confusing way. For example, the authors state that "shift from 10ºC to 4ºC had a strong effect" when describing "impact on translation (ribosomal occupancy)" (page 4), but in the next sentence, they state "a good correlation between mRNA levels and translation (Figure 2A)". Was ribosomal occupancy normalized to the transcript abundance?

      We do not perceive any discrepancy between the two statements. The former refers to the difference between time points, where we observed the largest change in both the transcriptome and ribosomal occupancy from 10ºC to 4ºC (as can be inferred in the PCA plot in Figure 2 - figure supplement 1). The latter refers to the observation that changes in mRNA levels mirrored, in most of cases, similar changes in the ribosomal occupancy.

      The ribosomal occupancy was not normalized, as that would essentially normalize the y-axis (ribosomal occupancy) with the x-axis (mRNA), and so express changes in “translational efficiency” as a function of changes in mRNA abundance. While this type of analysis can also reveal interesting biological phenomena, it would explore a different question.

      (5) "For most transcripts ... increased the abundance of a particular protein appears to correlate depend primarily on the abundance of its mRNA" (page 5). This is an overstatement, the protein levels were not quantified.

      As explained above, we now additionally monitored the expression of two GFP-tagged proteins (CEBP-1 and NUMR-1). Monitoring their expression, we observed the expected increase in GFP fluorescence in the cold (see Figure 2 – figure supplement 3 A-B). While we did not examine them also by western blot, these observations are in line with our conclusions.

      (6) The statement "Since transcription is the main determinant of mRNA levels, these results suggest that cold-specific gene expression primarily depends on transcription activation" seems to assume that message degradation doesn't have much of an impact at 4ºC. What is the evidence here? The authors themselves later suggest either transcription or mRNA stability in Discussion.

      While we cannot exclude that mRNA stability of some genes may be affected, this concern is more valid for the messages that go down in the cold. Although we have done it for only selected genes, each time we observed an increase in the mRNA levels, we also observed the corresponding increase in the protein; this study and Pekec et al. (2022). Then, the lips-11 reporter was designed to monitor the activity of its promoter, which we showed in sufficient to upregulate reporter GFP in the cold. We have now expanded the corresponding paragraph in Discussion, which will hopefully come across as more balanced.  

      Reviewer #2:

      (1) Alter title, conclusions to better reflect specific nature of the work.

      We now provided additional data and feel that it justifies our conclusions and title.

      (2) Use Gene Ontology searches to look at patterns of gene expression in RNA seq data.

      We now show it in Figure 2 – figure supplement 2.

      (3) Use genetic or lipidomic tools rather than solely adding exogenous lipids.

      We agree that lipidomic analysis is an important direction for future research, but feel that lipidomic analysis and further genetic experiments go beyond the scope of current manuscript.

      Reviewer #3:

      To strengthen the evidence for the role of IRE-1 in cold adaptation, the authors might consider performing additional functional assays, such as testing the effects of IRE-1 and XBP-1 mutations under varying cold conditions and testing the genetic interaction of ire-1 with xbp-1, skn-1, and hsf-1 in cold sensitivities. It is also worth using alternative approaches such as independent alleles of ire-1, knockdowns or tissue-specific knockouts (without potential developmental compensation in global constitutive mutants) to better characterize the contribution of IRE-1 to cold adaptation. Additionally, studies that examine tissue-specific responses to cold exposure could provide important insights, as different tissues may utilize distinct molecular pathways to adapt to cold stress.

      We also tested ire-1 and xbp-1 functions by RNAi-mediated depletion. SKN-1 is a good candidate for future studies, but Horikawa at al. (2024) showed that HSF-1 is not required for cold dormancy (at 4ºC); we also show now that HSF-1::GFP does not increase in the cold (Figure 2 – figure supplement 3C).

      This reviewer also recommends clarifying the novelty of your findings in the context of existing literature, particularly regarding the established roles of IRE-1 and XBP-1 in responding to endoplasmic reticulum stress.

      The entry point of this study was to clarify a long-standing problem in hibernation research, i.e., the apparent discrepancy between a global translation repression and de novo gene expression observed in the cold. By connecting cold-mediated expression of some genes to the IRE-1/XBP1 pathway, we strengthen the argumentation for transcription-mediated gene regulation in hibernating animals. We did go the extra mile to test the possible reason behind the activation of UPR<sup>ER</sup> in the cold but feel that a deeper analysis deserves a separate study.

      The term "hibernation" should be avoided or reworded since the study does not provide direct behavioral or physiological evidence for hibernation-like states; instead, the manuscript could refer to "cold-induced responses" or "adaptations to cold temperatures."

      The term “hibernation” was used before even in the context of the C. elegans dauer state, which, arguably, is even less appropriate. In addition to a global suppression of translation shown here, we reported before that the same cooling regime suppresses ageing (Habacher et al., 2016; Figure S1C). Incubating at 4ºC also arrests C. elegans development (Horikawa et al., 2024). Thus, while the worm and mammalian hibernation are certainly not equivalent – which we clearly spell out – we like to use “hibernation” interchangeably with “cold dormancy” to draw attention to a fascinating aspect of C. elegans biology. Still, we use now quotation marks in the title to avoid misunderstanding.

      The discussion could be strengthened by addressing the relevance of prior studies, such as those linking IRE-1 to SKN-1 (PMID: 27540856), TRPA-1 (PMID: 23415228), ZIP-10 (PMID: 29664006), HSF-1 (PMID: 38987256) in cold adaptation and elaborating on how your findings provide new

      The IRE-1/SKN-1 and ZIP-10 papers are now mentioned when describing the model in Figure 7. The TRP-1 and HSF-1 papers are cited when discussing physiological differences between different cold temperatures. Consistent with our studies, the HSF-1 paper shows that nematodes enter a dormant state at 4ºC (but at 9ºC and higher temperatures continue developing). Importantly, HSF-1 promotes the development at 9ºC but is not important for the arrest at 4ºC. We also shown now in Figure 2 – figure supplement 3C that HSF-1 does not go up at 4ºC.

    1. Author response:

      Reviewer #1 (Public Review):

      (1) The authors conclude that the committed progenitors revert to GSCs based on the coexpression of nanos2 and foxl2l nanos2 and based on expression of id1 in mutants but not in WT. Without functional data demonstrating that the progenitors revert to an earlier state, alternative interpretations should be considered. For example, it is possible that the cells initiate the committed progenitor program but continue to express the GSC program and that the coexpression of both programs blocks differentiation.

      Thanks for your insightful comment. We have explored possible alternative interpretations of our data. Regarding the suggested possibility of a continued GSC program in the mutant, we have examined the expression of GSC markers including nanos2 in the mutant at different stages. We found that in the mutant, nanos2 or other GSC markers were not significantly upregulated in GSC-to progenitor transition (G-P) and early progenitors (Prog-E) (Fig. 4B). The expression of these GSC markers was also low in the integrated clusters I4-I6 when G-P and Prog-E stages were prominent (Fig. 3D and Fig. 3E). GSC marker nanos2 was high only in mutant Prog-C. These results argue against continued GSC programs in the foxl2l mutants. Another possible explanation is that perhaps some mutant Prog-C acquires some GSC property with the upregulation of nanos2 instead of a continuous GSC program. We have now clarified our rationale about mutant cells gaining new GSC properties and included both interpretations in the Result.

      Consistent with this possibility, some Fox family members, FoxL2 and FoxPs for example, are known to be both activators and repressors of transcription or act primarily as repressors. Potentially relevant to this work, repressive activity of FoxL2 has been previously reported in the mammalian ovary (Pisarska et al Endocrinology 2004, Pisarska Am J. Phys Endo. Metabolism 2010, Kuo Reproduction 2012, Kuo Endocrinology 2011, as well as more recent publications). In that context interfering with FoxL2 was proposed to cause upregulated expression of genes normally repressed by FoxL2, accelerated follicle recruitment, and premature ovarian failure.

      FoxL2 exerts both activating and repressive activities. We believe that Foxl2l can also activate and repress its target gene expression. Although its target genes have not been clearly identified, Foxl2l may activate genes involved such process as oogenic meiosis, and may also repress other genes involved in other processes, say perhaps nanos2.

      (2) The authors conclude that the committed progenitor stage is "the gate toward female determination" and that the cells "stay at S-Phase temporarily before differentiation". This conclusion seems to be based solely on single cell RNAseq expression. In several species, including zebrafish, meiotic entry occurs earlier in females and has been correlated with ovary development. The possibility that the late progenitor stage, the stage when meiotic genes are detected in this study and a stage missing in foxl2l mutants, is actually the key stage for female determination cannot be excluded by the data provided.

      We agree that Prog-L is important for the initiation of female meiosis. We have made revision in the text to point out the importance of Prog-L in female differentiation.

      (3) The authors discuss prior working showing that loss of germ cells leads to male development and that germ cells are required for female development and claim to extend that work by showing here that some progenitors are already sexually differentiated. First, the stages compared are completely different. The earlier work looks at the primordial germ cells and their loss in the first few days of development before a gonad forms. In contrast, this work examines stages well after the gonad has formed and during sex determination.

      Both previous studies and our study indicate the important role of germ cells in zebrafish sex differentiation during gonadal development. The earlier works show that the abundance of primordial germ cells contributes to sex differentiation. Our current finding further suggests the existence of female identify in some germ cells at the juvenile stage and discusses the importance of cell in sexual differentiation. We have added the developmental age in our study to emphasize the age difference.

      The second concern is that the conclusion that the progenitors are differentiated is based solely on the expression of foxl2l, which is initially expressed in the juvenile ovary state that lab strains have been shown to develop through (Wilson et al Front Cell Dev Bio 2024). While it is fair to state that some cells express ovary markers at this stage, it is unclear that this is sufficient evidence that the cells are differentiated.

      The conclusion about the differentiation of progenitors is not based solely on foxl2l expression; rather, it is according to the whole transcriptomic profiles of both WT (Figure 1B) and foxl2l mutant cells (Figure 3A) as well as the foxl2l mutant phenotype (Figure 2C). Three types of progenitors, Prog-E, Prog-C and Prog-L were identified by whole transcriptomic analysis in WT. In foxl2l mutants, the transcriptomic profile further shows that Prog-L and meiotic cells are completely lost, and all germ cells undergo male differentiation eventually. These results together indicate that the differentiation of Prog-C to Prog-L guides the progenitor toward female differentiation. Our result also showed that in the juvenile gonad, foxl2l expression is high in two types of progenitors, Prog-C and Prog-L, and become low after meiotic entry.

      For example, in the context of the foxl2l mutant, the authors observe that GSCs and early progenitors inappropriately express foxl2l, but the mutants develop as males. Thus, expression of foxl2l transcripts alone is insufficient evidence to claim that the cells are already differentiated as female.

      The foxl2l mutants develop into males because they lack functional Foxl2l. Although the mutated foxl2l transcript is present in mutant cells, these transcripts are not functional. These mutants develop into males eventually. This result is consistent with our claim that functional Foxl2l is important for the development of Prog-L and female differentiation.

      (4) The comparison between medaka and zebrafish foxl2l mutants seems to suggest that Foxl2l is required for meiosis in medaka but has a different role in zebrafish. However, if foxl2l represses the earlier developmental programs of GSCs and early progenitors, it is possible that continued expression of these early programs interferes with activation of meiotic genes. This could account for the absence of the late progenitor stage in foxl2l mutants since the late progenitor stage is defined by and distinguished from the earlier stages by expression of foxl2l and meiotic genes. If so, foxl2l may be similarly required in both systems.

      Medaka and zebrafish Foxl2l may share similar functions such as the stimulation of meiotic gene expression and promotion of oogenesis in the female germ cells preparing for meiotic entry. In addition, we also detected aberrant upregulation of nanos2 in some foxl2l mutant cells. The idea of “continued expression of these early programs interferes with activation of meiotic genes” is conceivable, but for now we have no evidence for it. We do not know whether the absence of meiotic genes is due to an interference caused by the activation of nanos2 or due to the complete loss of Prog-L and meiotic cells. It will also be interesting to find out whether medaka Foxl2l has a role in early progenitors

      (5) The authors state that "Foxl2l may ensure female differentiation by preventing stemness and antagonizing male development." It is unclear why suppressing stemness would be necessary for female differentiation since female zebrafish have stem cells as do male zebrafish. It seems likely that turning off the GSC and early differentiation programs is important for allowing expression of meiosis and oocyte differentiation genes, and that a gene other than Foxl2l is required for differentiation from GSCs to spermatocytes.

      It is true that we have not proved whether suppression of stemness is required for female differentiation. Maybe our earlier statement is a bit misleading. We agree that it is likely that turning off the GSC and early differentiation programs is important for allowing expression of meiotic and oocyte differentiation genes, and that a gene other than Foxl2l is required for differentiation from GSCs to spermatocytes. To avoid confusion, we have modified our statement in the text.

      (6) Based on its expression in mutant progenitors, p53 is proposed to assist with alternative differentiation of mutant germ cells. Although p53 transcripts are expressed, no evidence is provided that p53 is involved in differentiation of germ cells, and sex bias has not been associated with the published p53 mutants in zebrafish. Furthermore, while p53 has been shown to be important for ovary to testis transformation in mutant contexts in adults, it appears dispensable for testis development in mutants that disrupt ovary differentiation in earlier stages (Rodriguez-Mari et al PLoS Gen 2010, Shive PNAS 2010, Hartung et al Mol. Reprod. Dev 2014, Miao Development 2017, Kaufman et al PLoSGen 2018, Bertho et al Development 2021. It is possible that p53 eliminates foxl2l mutant germ cells that are simultaneously expressing multiple developmental programs, but this possibility would need to be tested.

      The tp53<sup>-/-</sup>foxl2l<sup>-/-</sup> double mutant cannot alleviate the all-male phenotype of foxl2l<sup>-/-</sup> mutant (Dev Biol, 517, 91-99, 2024), indicating that the male development is not due to p53-mediated germ cell apoptosis. We have cited the suggested papers and compared relation of tp53 between these mutants (fancl, zar1, etc.) mentioned in the cited papers. Since tp53 was enriched in certain foxl2l<sup>-/-</sup> mutant cell clusters, and tp53 mutation fails to rescue the all-male phenotype, it is possible that p53 expressed in these mutant cell clusters has roles other than inducing apoptosis. One assumption is that p53 may be involved in the germ cell differentiation, especially p53 is known to promote differentiation of airway epithelial progenitors, adipogenesis and embryonic stem cells. We have emphasized that the suggested role of p53 in germ cell differentiation is our assumption in the Discussion.

      Reviewer #3 (Public Review):

      This is the first report to show a transcriptional factor, foxl2l, is essential for the development of female germs. Without foxl2l, germ cells will be developed into sperms. The report also clearly defined the arrested stage of early germ cells in foxl2l mutants, or stages that is critical for foxl2l to play a role for the further development of female germ cells.

      (1) Due to lack of cell lineage tracing, the claim of foxl2l suppression of dedifferentiate of progenitor cells to GSC based on the gene expression and cell number changes is weak.

      Thanks for your comments pointing out our contribution and also weakness. We acknowledge the lack of direct evidence on the reversion of mutant Prog-C to GSC in our data. We now removed the claim about the repression of stemness by Foxl2l.

      (2) In addition, separation of early germ cell types in foxl2l mutant using marker genes from WT may not be optimal.

      The cell type of mutant cell is determined by two independent analyses. First is inferring the developmental stage of mutant cells. This approach assumes that mutant cells can indeed be mapped to specific WT stages through their transcriptomic profiles. However, as indicated by this reviewer’s comments, mutant cells exhibited heterogeneity and can be distinct from WT cells. Defining cell types in mutants by WT markers may not be optimal. To address this, we conducted another analysis, co-clustering. Mutant cells and WT cells at early stages (GSC , G-P, Prog-E, Prog-C(S) and Prog-C) were co-clustered. This approach does not assume a direct correspondence between mutant and WT developmental stages. Instead, it facilitates the identification of novel germ cell types in mutants while characterizing the relationship between WT and mutant cells. In some clusters, both WT and mutant cells were present, indicating high transcriptomic similarity. In other clusters, most cells are only mutant cells, indicating distinct mutant cell types (Figure 3C). We can, therefore, assign developmental properties to these mutant cells with confidence.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      The aim of this study is to test the overarching hypothesis that plasticity in BNST CRF neurons drives distinct behavioral responses to unpredictable threat in males and females. The manuscript provides evidence for a possible sex-specific role for CRF-expressing neurons in the BNST in unpredictable aversive conditioning and subsequent hypervigilance across sexes. As the authors note, this is an important question given the high prevalence of sex differences in stress-related disorders, like PTSD, and the role of hypervigilance and avoidance behaviors in these conditions. The study includes in vivo manipulation, bulk calcium imaging, and cellular resolution calcium imaging, which yield important insights into cell-type specific activity patterns. However, it is difficult to generate an overall conclusion from this manuscript, given that many of the results are inconsistent across sexes and across tests and there is an overall lack of converging evidence. For example, partial conditioning yields increased startle in males but not females, yet, CRF KO only increases startle response in males after full conditioning, not partial, and CRF neurons show similar activity patterns between partial and full conditioning across sexes. Further, while the study includes a KO of CRF, it does not directly address the stated aim of assessing whether plasticity in CRF neurons drives the subsequent behavioral effects unpredictable threat.

      We appreciate the reviewer’s summary and agree that there is a large amount of complexity to the results, and that it was difficult to generate a simple model/conclusion to summarize our work. This is the unfortunate side effect of looking across both sexes at different conditioning paradigms, however, we believe that it is important to convey this information to the field even without a simple answer.  Our data reinforces the very important findings from the Maren and Holmes groups that partial fear is a different process than full fear, and that the BNST plays a differential role here. We have reworded the manuscript to better convey this complexity.

      A major strength of this manuscript is the inclusion of both males and females and attention to possible behavioral and neurobiological differences between them throughout. However, to properly assess sex-differences, sex should be included as a factor in ANOVA (e.g. for freezing, startle, and feeding data in Figure 1) to assess whether there is a significant main effect or interaction with sex. If sex is not a statistically significant factor, both sexes should be combined for subsequent analyses. See, Garcia-Sifuentes and Maney, eLife 2021 https://elifesciences.org/articles/70817. There are additional cases where t-tests are used to compare groups when repeated measures ANOVAs would be more appropriate and rigorous.

      We agree with the reviewer that this is the more appropriate analysis and have changed the analysis and figures throughout the revised manuscript to better assess sex differences as well as differences between fear conditions.

      Additionally, it's unclear whether the two sexes are equally responsive to the shock during conditioning and if this is underlying some of the differences in behavioral and neuronal effects observed. There are some reports that suggest shock sensitivity differs across sexes in rodents, and thus, using a standard shock intensity for both males and females may be confounding effects in this study.

      This is a great point. We have conducted appropriate analysis (Sex by Tone Repeated measures two-way ANOVAS for each of the groups: Ctrl, Full, Part) and there are no sex differences in freezing between males and females. The extent of conditioning is not different between the groups suggesting that if there was a difference in shock sensitivity, it is not driving any discernible differences in behavioral performance. However, it is possible that the experience of the shock differs for the animals even in the absence of any measurable behavior.

      The data does not rule out that BNST CRF activity is not purely tracking the mobility state of the animal, given that the differences in activity also track with differences in freezing behavior. The data shows an inverse relationship between activity and freezing. This may explain a paradox in the data which is why males show a greater suppression of BNST activity after partial conditioning than full conditioning, if that activity is suspected to drive the increased anxiety-like response. Perhaps it reflects that activity is significantly suppressed at the end of the conditioning session because animals are likely to be continuously freezing after repeated shock presentations in that context. It would also explain why there is less of a suppression in activity over the course of the recall session, because there is less freezing as well during recall compared with conditioning.

      While it is possible that the BNST may be tracking activity, we believe it is not purely tracking mobility state. For instance, while freezing increases across tone exposures in Part fear regardless of sex, males show an increase while females show a reduction in BNST response during tone 5 (Fig 2K). The data the reviewer refers to showing the inverse relationship with BNST activity and freezing would have suggested the opposite response if it were purely tracking the mobility state of the animal. This is also the case with BNST<sup>CRF</sup> activity to first and last tone during recall. Despite the suppression of activity over the course of recall (Fig 5K), we see an increase in BNST<sup>CRF</sup> tone response when comparing tone 1 and 6 in males and a decrease in females (Fig 6M), again suggesting the BNST is responding to more than just activity.

      A mechanistic hypothesis linking BNST CRF neurons, the behavioral effects observed after fear conditioning, and manipulation of CRF itself are not clearly addressed here.

      We disagree with this assertion. The data suggests a model in which males respond with increased arousal and Part fear males show persistent activation of the BNST and BNST<sup>CRF</sup> neurons during fear conditioning and recall while female Part fear mice show the opposite response. This female response differs from what the field believes to be the role of the BNST in sustained fear. Additionally, we show that CRF knockdown is not involved in fear differentiation or fear expression in males, while it enhances fear learning and recall in females. We have reworded the manuscript to highlight these novel findings.

      Reviewer #2 (Public Review):

      This study examined the role of CRF neurons in the BNST in both phasic and sustained fear in males and females. The authors first established a differential fear paradigm whereby shocks were consistently paired with tones (Full) or only paired with tones 50% of the time (Part), or controls who were exposed to only tones with no shocks. Recall tests established that both Full and Part conditioned male and female mice froze to the tones, with no difference between the paradigms. Additional studies using the NSF and startle test, established that neither fear paradigm produced behavioral changes in the NSF test, suggesting that these fear paradigms do not result in an increase in anxiety-like behavior. Part fear conditioning, but not Full, did enhance startle responses in males but not females, suggesting that this fear paradigm did produce sustained increases in hypervigilance in males exclusively.

      Thank you for this clear summary of the behavioral work.

      Photometry studies found that while undifferentiated BNST neurons all responded to shock itself, only Full conditioning in males lead to a progressive enhancement of the magnitude of this response. BNST neurons in males, but not females, were also responsive to tone onset in both fear paradigms, but only in Full fear did the magnitude of this response increase across training. Knockdown of CRF from the BNST had no effect on fear learning in males or females, nor any effect in males on fear recall in either paradigm, but in females enhanced both baseline and tone-induced freezing only in Part fear group. When looking at anxiety following fear training, it was found in males that CRF knockdown modulated anxiety in Part fear trained animals and amplified startle in Fully trained males but had no effect in either test in females. Using 1P imaging, it was found that CRF neurons in the BNST generally decline in activity across both conditioning and recall trials, with some subtle sex differences emerging in the Part fear trained animals in that in females BNST CRF neurons were inhibited after both shock and omission trials but in males this only occurred after shock and not omission trials. In recall trials, CRF BNST neuron activity remained higher in Part conditioned mice relative to Full conditioned mice.

      Overall, this is a very detailed and complex study that incorporates both differing fear training paradigms and males and females, as well as a suite of both state of the art imaging techniques and gene knockdown approaches to isolate the role and contributions of CRF neurons in the BNST to these behavioral phenomena. The strengths of this study come from the thorough approach that the authors have taken, which in turn helped to elucidate nuanced and sex specific roles of these neurons in the BNST to differing aspects of phasic and sustained fear. More so, the methods employed provide a strong degree of cellular resolution for CRF neurons in the BNST. In general, the conclusions appropriately follow the data, although the authors do tend to minimize some of the inconsistencies across studies (discussed in more depth below), which could be better addressed through discussion of these in greater depth. As such, the primary weakness of this manuscript comes largely from the discussion and interpretation of mixed findings without a level of detail and nuance that reflects the complexity, and somewhat inconsistency, across the studies. These points are detailed below:

      - Given the focus on CRF neurons in the BNST, it is unclear why the photometry studies were performed in undifferentiated BNST neurons as opposed to CRF neurons specifically (although this is addressed, to some degree, subsequently with the 1P studies in CRF neurons directly). This does limit the continuity of the data from the photometry studies to the subsequent knockdown and 1P imaging studies. The authors should address the rationale for this approach so it is clear why they have moved from broader to more refined approaches.

      The reviewer raises a good point.  We did some preliminary photometry studies with BNST CRF neurons and found that there was poor time locked signal. We reasoned that this was due to the heterogeneity of the cell activity, as we saw in our previous publication (Yu et al). Because of this, we moved to the 1p imaging work in place of continued BNST CRF photometry. We have also reworded the manuscript to better discuss the complexities and inconsistencies in findings across the studies.

      - The CRF KD studies are interesting, but it remains speculative as to whether these effects are mediated locally in the BNST or due to CRF signaling at downstream targets. As the literature on local pharmacological manipulation of CRF signaling within the BNST seems to be largely performed in males, the addition of pharmacological studies here would benefit this to help to resolve if these changes are indeed mediated by local impairments in CRF release within the BNST or not. While it is not essential to add these experiments, the manuscript would benefit from a more clear description of what pharmacological studies could be performed to resolve this issue.

      We agree with the reviewer that the addition of this experiment would be highly informative for differentiating the role of CRF in the BNST. This is something that will need to be considered moving forward and we have added this as a point of discussion.

      - While I can appreciate the authors perspective, I think it is more appropriate to state that startle correlates with anxiety as opposed to outright stating that startle IS anxiety. Anxiety by definition is a behavioral cluster involving many outputs, of which avoidance behavior is key. Startle, like autonomic activation, correlates with anxiety but is not the same thing as a behavioral state of anxiety (particularly when the startle response dissociates from behavior in the NSF test, which more directly tests avoidance and apprehension). Throughout the manuscript the use of anxiety or vigilance to describe startle becomes interchangeable, but then the authors also dissociate these two, such as in the first paragraph of the discussion when stating that the Part fear paradigm produces hypervigilance in males without influencing fear or anxiety-like behaviors. The manuscript would benefit from harmonization of the language used to operationally define these behaviors and my recommendation would be to remain consistent with the description that startle represents hypervigilance and not anxiety, per se.

      The reviewer raises an excellent point, we have clarified in the revised manuscript.

      - The interpretation of the anxiety data following CRF KD is somewhat confusing. First, while the authors found no effect of fear training on behavior in the NSF test in the initial studies, now they do, however somewhat contradictory to what one would expect they found that Full fear trained males had reduced latency to feed (indicative of an anxiolytic response), which was unaltered by CRF KD, but in Part fear (which appeared to have no effect on its own in the NSF test), KD of CRF in these animals produced an anxiolytic effect. Given that the Part fear group was no different from control here it is difficult to interpret these data as now CRF KD does reduce latency to feed in this group, suggesting that removal of CRF now somehow conveys an anxiolytic response for Part fear animals. In the discussion the authors refer to this outcome as CRF KD "normalizing" the behavior in the NSF test of Part fear conditioned animals as now it parallels what is seen after Full fear, but given that the Part fear animals with GFP were no different then controls (and neither of these fear training paradigms produced any effect in the NSF test in the first arm of studies), it seems inappropriate to refer to this as "normalization" as it is unclear how this is now normalized. Given the complexity of these behavioral data, some greater depth in the discussion is required to put these data in context and describe the nuance of these outcomes, in particular a discussion of possible experimental factors between the initial behavioral studies and those in the CRF KD arm that could explain the discrepancy in the NSF test would be good (such as the inclusion of surgery, or other factors that may have differed between these experiments). These behavioral outcomes are even more complex given that the opposite effect was found in startle whereby CRF KD amplified startle in Full trained animals. As such, this portion of the discussion requires some reworking to more adequately address the complexity of these behavioral findings.

      The reviewer raises a good point, and we agree that there are many inconsistencies in the behaviors. We believe it is still good to show these results but have expanded the manuscript on potential reasons for these behavioral inconsistencies.

      Reviewer #3 (Public Review):

      Hon et al. investigated the role of BNST CRF signaling in modulating phasic and sustained fear in male and female mice. They found that partial and full fear conditioning had similar effects in both sexes during conditioning and during recall. However, males in the partially reinforced fear conditioning group showed enhanced acoustic startle, compared to the fully reinforced fear conditioning group, an effect not seen in females. Using fiber photometry to record calcium activity in all BNST neurons, the authors show that the BNST was responsive to foot shock in both sexes and both conditioning groups. Shock response increased over the session in males in the fully conditioned fear group, an effect not observed in the partially conditioned fear group. This effect was not observed in females. Additionally, tone onset resulted in increased BNST activity in both male groups, with the tone response increasing over time in the fully conditioned fear group. This effect was less pronounced in females, with partially conditioned females exhibiting a larger BNST response. During recall in males, BNST activity was suppressed below baseline during tone presentations and was significantly greater in the partially conditioned fear group. Both female groups showed an enhanced BNST response to the tone that slowly decayed over time. Next, they knocked CRF in the BNST to examine its effect on fear conditioning, recall and anxiety-like behavior after fear. They found no effect of the knockdown in either sex or group during fear conditioning. During fear recall, BNST CRF knockdown lead to an increase in freezing in only the partially conditioned females. In the anxiety-like behavior tasks, BNST CRF knockdown lead to increased anxiolysis in the partially reinforced fear male, but not in females. Surprisingly, BNST CRF knockdown increased startle response in fully conditioned, but not partially conditioned males. An effect not observed in either female group. In a final set of experiments, the authors single photon calcium imaging to record BNST CRF cell activity during fear conditioning and recall. Approximately, 1/3 of BNST CRF cells were excited by shock in both sexes, with the rest inhibited and no differences were observed between sexes or group during fear conditioning. During recall, BNST CRF activity decreased in both sexes, an effect pronounced in male and female fully conditioned fear groups.

      Overall, these data provide novel, intriguing evidence in how BNST CRF neurons may encode phasic and sustained fear differentially in males and females. The experiments were rigorous.

      We thank you for this positive review of our manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      There are several graphs representing different analyses of (presumably) the same group of subjects, but which have different N/group. For example, in Figure 2:

      (1) Fig 2P seems to have n=10 in Part Male group (Peak), but 2Q only has n=9 in Part Male group (AUC)

      (2) Fig 2S seems to have n=10 in Part Female group (Peak), but 2T only has n=7 in Part Female group (AUC)

      (3) Fig 2G (Tone Resp) has n=6 Full Males but 2F (Tone Resp), 2H (Shock Resp), and 2I (Shock Resp) have n=7 Full Males

      (4) Fig 2K (Tone Resp) has n=7 Full Females but 2L (Tone Resp), 2M (Shock Resp), and 2N (Shock Resp) have n=8 Full Females

      (5) Fig 2L (Tone Resp) has n=9 Part Females but 2K (Tone Resp), 2M (Shock Resp), and 2N (Shock Resp) have n=10 Part Females

      It's possible that this is just due to overlapping individual data points which are made harder to see due to the low resolution of the figures. If so, this can be easily rectified. However, there may also be subjects missing from some analyses which must be clarified or corrected.

      We thank you for catching these. We have gone through and fixed any issues with data points and have added statistics and exclusions in datasets to figure legends to further explain inconsistencies.

      Regarding statistical tests:

      (2) Data in Figs 2G and 2I should be analyzed using a two-way RM ANOVA.

      We have now included sex as a factor in most of our analysis and are now using appropriate statistical tests.

      (3) Data in Fig 3K should be analyzed using a two-way RM ANOVA.

      We are now using appropriate statistical tests.

      Calcium activity in response to the shock during conditioning and in response to the tone during recall should be included in Figure 5. Given partial and full animals also receive unequal presentations of the cue, it would be useful to see the effects trial by trial or normalized to the first 3 presentations only.

      The reviewer raises a great point. We have changed this figure and have now added the response to shock and tones. Since we are most interested in the difference between sustained and phasic fear, we decided to compare tone 3 in Full fear and tone 4 in Part fear, which differ in the ambiguity of their cue and only have one tone difference.

      Histology maps should be included for all experiments depicting viral spread and implant location for all animals, in addition to the included representative histology images. These can be placed in the supplement.

      We agree this is helpful. While we have confirmed all of the experiments are hits, the tissue is no longer in condition for this analysis.

      Referring to the quantification of peaks in fiber photometry and cellular resolution calcium imaging data as "spikes" is a bit misleading given the inexact relationship between GCAMP sensor dynamics/calcium binding and neuronal action potentials, perhaps calling it "event" frequency would be more clear.

      We have changed the references of spikes to events as suggested.

      The legend for Figure 2S is mislabeled as A.

      Thank you for catching this mistake, it has been fixed.

      The methods refer to CRFR1 fl/fl animals but it seems no experiments used these animals, only CRF fl/fl.

      We have fixed this, thank you.

      Reviewer #2 (Recommendations For The Authors):

      As stated in the public review, while I think the addition of local pharmacological studies blocking CRF1 and 2 receptors in the BNST in both males and females, done under the same conditions as all of the other testing herein, would help to resolve some of the speculation of interpreting the CRF KD data, I dont think these studies are essential to do, but it would be good for the authors to more explicitly state what studies could be done and how they could facilitate interpretation of these data.

      Thank you for this suggestion. We have added this discussion into the manuscript.

      Asides from this, my other recommendations for the authors are to more clearly address the discrepancies in behavioral outcomes across studies and explicitly describe their rationale for the sequence of experiments performed and to harmonize their operationalization of how they define anxiety.

      Again, we appreciate these great suggestions. We have added more discussion on the behavioral discrepancies as well as rationale for the experiments. We have also changed the wording to remain consistent that the NSF test relates to anxiety and the Startle test relates to vigilance.

      - In Figure 2, Panel S is listed as Panel A in the caption and should be corrected.

      Thank you for catching this mistake, we have fixed it.

      Reviewer #3 (Recommendations For The Authors):

      My biggest concerns I have regard the interpretations and some conclusions from this data set, which I have stated below.

      (1) It was surprising to see minimal and somewhat conflicting behavioral effects due to BNST CRF knockdown. The authors provide a representative image and address this in the conclusion. They mention the role of local vs projection CRF circuits as well as the role of GABA. I don't think those experiments are necessary for this manuscript. However, it may be worthwhile to see through in situ hybridization or IHC, to see BNST CRF levels after both full and partial conditioned fear paradigms. Additionally, it would help to see a quantification of the knockdown of the animals.

      Thank you for these great suggestions. We will consider these for future experiments. We piloted out some CRF sensor experiments to probe this, but it was unclear if the signal to noise for the sensor was sufficient. We hope to do more of this in the future if we ever manage to get funding for this work.

      The authors can add a figure showing deltaF/F changes from control.

      We did not have control mice in these in-vivo experiments Our main interests lie in understanding the differences in Full and Part Fear conditioning paradigms specifically.

      (2) Related to the previous point, it was surprising to see an effect of the CRF deletion in the full fear group compared to the partial fear in the acoustic startle task. To strengthen the conclusion about differential recruitment of CRF during phasic and sustained fear, the experiment in my previous point could help elucidate that. Conversely, intra-BNST administration of a CRF antagonist into the BNST before the acoustic startle after both conditioning tasks could also help. Or patch from BNST CRF neurons after the conditioning tasks to measure intrinsic excitability. Not all these experiments are needed to support the conclusion, it's some examples.

      We thank the reviewer for these suggestions and agree that these are important experiments. We will consider this in future experiments exploring the role of BNST CRF in fear conditioning.

      (3) In Figure 5 F and K, the authors report data combined for both part and full fear conditioning. Were there any differences between the number of excited or inhibited neurons b/t the conditioning groups?

      We are only looking at the first shock exposure in these figures. These were combined because the first tone and shock exposure is identical in Full and Part fear conditioning. Differences in these behavioral paradigms emerge after Tone 3 exposure, where Part fear does not receive a shock while Full fear does.

      Also, can the authors separate male and female traces in Fig 5 E and P?

      Traces in Fig E are from females only. We did not include male traces because males and females had identical responses to first shock, and we felt only one trace was needed as an example. Traces in Figure P are from males. We did not show female traces because females did not show differential effects from baseline to end.

      (4) Also, regarding the calcium imaging data, what was the average length of a transient induced by shock? Were there any differences between the sexes?

      We have many cells in each condition, and the length of traces after shock were all different and hard to quantify, as for example, sometimes cells were active before shock and thus trace length would be difficult to quantify. Therefore, to keep consistency and reduce ambiguity regarding trace lengths, we focused on keeping the time consistent across mice and focused on the 10 second window post shock to be consistent across conditions.